首页 > 最新文献

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)最新文献

英文 中文
Cuckoo Node Hashing on GPUs gpu上的布谷鸟节点哈希
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00013
Muhammad Javed, Hao Zhou, David Troendle, Byunghyun Jang
The hash table finds numerous applications in many different domains, but its potential for non-coalesced memory accesses and execution divergence characteristics impose optimization challenges on GPUs. We propose a novel hash table design, referred to as Cuckoo Node Hashing, which aims to better exploit the massive data parallelism offered by GPUs. At the core of its design, we leverage Cuckoo Hashing, one of known hash table design schemes, in a closed-address manner, which, to our knowledge, is the first attempt on GPUs. We also propose an architecture-aware warp-cooperative reordering algorithm that improves the memory performance and thread divergence of Cuckoo Node Hashing and efficiently increases the likelihood of coalesced memory accesses in hash table operations. Our experiments show that Cuckoo Node Hashing outperforms and scales better than existing state-of-the-art GPU hash table designs such as DACHash and Slab Hash with a peak performance of 5.03 billion queries/second in static searching and 4.34 billion insertions/second in static building.
哈希表可以在许多不同的领域中找到许多应用程序,但是它对非合并内存访问和执行发散特性的潜在影响给gpu带来了优化挑战。我们提出了一种新的哈希表设计,称为布谷鸟节点哈希,旨在更好地利用gpu提供的海量数据并行性。在其设计的核心,我们以封闭地址的方式利用布谷鸟哈希,这是已知的哈希表设计方案之一,据我们所知,这是gpu上的第一次尝试。我们还提出了一种架构感知的warp-cooperative重排序算法,该算法提高了Cuckoo Node哈希的内存性能和线程散度,并有效地增加了哈希表操作中合并内存访问的可能性。我们的实验表明,杜鹃节点哈希比现有的最先进的GPU哈希表设计(如DACHash和Slab hash)性能和可扩展性更好,静态搜索的峰值性能为50.3亿次查询/秒,静态构建的峰值性能为43.4亿次插入/秒。
{"title":"Cuckoo Node Hashing on GPUs","authors":"Muhammad Javed, Hao Zhou, David Troendle, Byunghyun Jang","doi":"10.1109/ISPDC55340.2022.00013","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00013","url":null,"abstract":"The hash table finds numerous applications in many different domains, but its potential for non-coalesced memory accesses and execution divergence characteristics impose optimization challenges on GPUs. We propose a novel hash table design, referred to as Cuckoo Node Hashing, which aims to better exploit the massive data parallelism offered by GPUs. At the core of its design, we leverage Cuckoo Hashing, one of known hash table design schemes, in a closed-address manner, which, to our knowledge, is the first attempt on GPUs. We also propose an architecture-aware warp-cooperative reordering algorithm that improves the memory performance and thread divergence of Cuckoo Node Hashing and efficiently increases the likelihood of coalesced memory accesses in hash table operations. Our experiments show that Cuckoo Node Hashing outperforms and scales better than existing state-of-the-art GPU hash table designs such as DACHash and Slab Hash with a peak performance of 5.03 billion queries/second in static searching and 4.34 billion insertions/second in static building.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127319594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Workload Deployment and Configuration Reconciliation at Scale in Kubernetes-Based Edge-Cloud Continuums 基于kubernetes的边缘云连续体中的大规模工作负载部署和配置协调
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00026
D. Hass, Josef Spillner
Continuum computing promises the abstraction of physical node location and node platform stack in order to create a seamless application deployment and execution across edges and cloud data centres. For industrial IoT applications, the demand to generate data insights in conjunction with an installed base of increasingly capable edge devices is calling for appropriate continuum computing interfaces. Derived from a case study in industrial water flow monitoring and based on the industry’s de-facto standard Kubernetes to deploy complex containerised workloads, we present an appropriate continuum deployment mechanism based on custom Kubernetes controllers and CI/CD, called Kontinuum Controller. Through synthetic experiments and a holistic cross-provider deployment, we investigate its scalability with emphasis on reconciling adjusted configuration per application and per node, a critical requirement by industrial customers. Our findings convey that Kubernetes by default would enter undesirable oscillation already for modestly sized deployments. Thus, we also discuss possible solutions.
连续计算承诺抽象物理节点位置和节点平台堆栈,以便创建跨边缘和云数据中心的无缝应用程序部署和执行。对于工业物联网应用,生成数据洞察力的需求与日益强大的边缘设备的安装基础相结合,需要适当的连续计算接口。基于工业水流监测的案例研究,基于行业实际标准Kubernetes部署复杂的容器化工作负载,我们提出了一种基于自定义Kubernetes控制器和CI/CD的合适的连续部署机制,称为Kontinuum Controller。通过综合实验和整体跨提供商部署,我们研究了其可扩展性,重点是协调每个应用程序和每个节点的调整配置,这是工业客户的关键需求。我们的发现表明,对于中等规模的部署,Kubernetes在默认情况下已经进入了不受欢迎的振荡状态。因此,我们也讨论了可能的解决方案。
{"title":"Workload Deployment and Configuration Reconciliation at Scale in Kubernetes-Based Edge-Cloud Continuums","authors":"D. Hass, Josef Spillner","doi":"10.1109/ISPDC55340.2022.00026","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00026","url":null,"abstract":"Continuum computing promises the abstraction of physical node location and node platform stack in order to create a seamless application deployment and execution across edges and cloud data centres. For industrial IoT applications, the demand to generate data insights in conjunction with an installed base of increasingly capable edge devices is calling for appropriate continuum computing interfaces. Derived from a case study in industrial water flow monitoring and based on the industry’s de-facto standard Kubernetes to deploy complex containerised workloads, we present an appropriate continuum deployment mechanism based on custom Kubernetes controllers and CI/CD, called Kontinuum Controller. Through synthetic experiments and a holistic cross-provider deployment, we investigate its scalability with emphasis on reconciling adjusted configuration per application and per node, a critical requirement by industrial customers. Our findings convey that Kubernetes by default would enter undesirable oscillation already for modestly sized deployments. Thus, we also discuss possible solutions.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122771608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A hybrid clustering algorithm for high-performance edge computing devices [Short] 一种高性能边缘计算设备的混合聚类算法
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00020
G. Laccetti, M. Lapegna, D. Romano
Clustering algorithms are efficient tools for discovering correlations or affinities within large datasets and are the basis of several Artificial Intelligence processes based on data generated by sensor networks. Recently, such algorithms have found an active application area closely correlated to the Edge Computing paradigm. The final aim is to transfer intelligence and decision-making ability near the edge of the sensors networks, thus avoiding the stringent requests for low-latency and large-bandwidth networks typical of the Cloud Computing model. In such a context, the present work describes a new hybrid version of a clustering algorithm for the NVIDIA Jetson Nano board by integrating two different parallel strategies. The algorithm is later evaluated from the points of view of the performance and energy consumption, comparing it with two high-end GPU-based computing systems. The results confirm the possibility of creating intelligent sensor networks where decisions are taken at the data collection points.
聚类算法是发现大型数据集中的相关性或亲和力的有效工具,是基于传感器网络生成的数据的几个人工智能过程的基础。最近,这些算法发现了一个与边缘计算范式密切相关的活跃应用领域。最终目标是将智能和决策能力转移到传感器网络的边缘附近,从而避免云计算模型对低延迟和大带宽网络的严格要求。在这样的背景下,本研究通过整合两种不同的并行策略,描述了NVIDIA Jetson Nano板的聚类算法的新混合版本。从性能和能耗两方面对该算法进行了评价,并与两种高端gpu计算系统进行了比较。研究结果证实了在数据收集点做出决策的智能传感器网络的可能性。
{"title":"A hybrid clustering algorithm for high-performance edge computing devices [Short]","authors":"G. Laccetti, M. Lapegna, D. Romano","doi":"10.1109/ISPDC55340.2022.00020","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00020","url":null,"abstract":"Clustering algorithms are efficient tools for discovering correlations or affinities within large datasets and are the basis of several Artificial Intelligence processes based on data generated by sensor networks. Recently, such algorithms have found an active application area closely correlated to the Edge Computing paradigm. The final aim is to transfer intelligence and decision-making ability near the edge of the sensors networks, thus avoiding the stringent requests for low-latency and large-bandwidth networks typical of the Cloud Computing model. In such a context, the present work describes a new hybrid version of a clustering algorithm for the NVIDIA Jetson Nano board by integrating two different parallel strategies. The algorithm is later evaluated from the points of view of the performance and energy consumption, comparing it with two high-end GPU-based computing systems. The results confirm the possibility of creating intelligent sensor networks where decisions are taken at the data collection points.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123004379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A scalable algorithm for homomorphic computing on multi-core clusters 多核集群上同态计算的可扩展算法
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00017
F. Gava, L. Bayati
Homomorphic encryption draws huge attention as it provides a way of privacy-preserving computations on encrypted data. But sadly, such computations are extremely expensive both in terms of calculation time and memory consumption and so much slower than the corresponding computations with unencrypted data. One solution is using parallelism and in this work, we investigate using distributed architectures of interconnected nodes (multi-core clusters) to execute homomorphic computations that have been programmed with the cingulata environment, a toolchain which is able to generate boolean circuits (where gates manipulate encrypted booleans) from homomorphic C++ codes. Such circuits are spliting into slices and we have used a bsp algorithm to executed each of them.
同态加密为加密数据提供了一种保护隐私的计算方法,引起了人们的广泛关注。但遗憾的是,这种计算在计算时间和内存消耗方面都非常昂贵,并且比使用未加密数据的相应计算慢得多。一种解决方案是使用并行性,在这项工作中,我们研究使用互联节点(多核集群)的分布式架构来执行用cingulata环境编程的同态计算,cingulata环境是一个能够从同态c++代码生成布尔电路(其中门操纵加密布尔值)的工具链。这样的电路被分割成薄片,我们使用了bsp算法来执行每一个薄片。
{"title":"A scalable algorithm for homomorphic computing on multi-core clusters","authors":"F. Gava, L. Bayati","doi":"10.1109/ISPDC55340.2022.00017","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00017","url":null,"abstract":"Homomorphic encryption draws huge attention as it provides a way of privacy-preserving computations on encrypted data. But sadly, such computations are extremely expensive both in terms of calculation time and memory consumption and so much slower than the corresponding computations with unencrypted data. One solution is using parallelism and in this work, we investigate using distributed architectures of interconnected nodes (multi-core clusters) to execute homomorphic computations that have been programmed with the cingulata environment, a toolchain which is able to generate boolean circuits (where gates manipulate encrypted booleans) from homomorphic C++ codes. Such circuits are spliting into slices and we have used a bsp algorithm to executed each of them.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126682461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating the Impact of Communication Schemes for Distributed Graph Processing 估计通信方案对分布式图处理的影响
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00016
Tian Ye, S. Kuppannagari, C. Rose, Sasindu Wijeratne, R. Kannan, V. Prasanna
Extreme scale graph analytics is imperative for several real-world Big Data applications with the underlying graph structure containing millions or billions of vertices and edges. Since such huge graphs cannot fit into the memory of a single computer, distributed processing of the graph is required. Several frameworks have been developed for performing graph processing on distributed systems. The frameworks focus primarily on choosing the right computation model and the partitioning scheme under the assumption that such design choices will automatically reduce the communication overheads. For any computational model and partitioning scheme, communication schemes — the data to be communicated and the virtual interconnection network among the nodes — have significant impact on the performance. To analyze this impact, in this work, we identify widely used communication schemes and estimate their performance. Analyzing the trade-offs between the number of compute nodes and communication costs of various schemes on a distributed platform by brute force experimentation can be prohibitively expensive. Thus, our performance estimation models provide an economic way to perform the analyses given the partitions and the communication scheme as input. We validate our model on a local HPC cluster as well as the cloud hosted NSF Chameleon cluster. Using our estimates as well as the actual measurements, we compare the communication schemes and provide conditions under which one scheme should be preferred over the others.
对于包含数百万或数十亿个顶点和边的底层图结构的几个现实世界的大数据应用程序来说,极端尺度图分析是必不可少的。由于如此庞大的图形无法装入单个计算机的内存,因此需要对图形进行分布式处理。为了在分布式系统上执行图形处理,已经开发了几个框架。这些框架主要关注选择正确的计算模型和划分方案,并假设这样的设计选择将自动减少通信开销。对于任何计算模型和分区方案,通信方案——要通信的数据和节点之间的虚拟互联网络——对性能有重要影响。为了分析这种影响,在这项工作中,我们确定了广泛使用的通信方案并估计了它们的性能。通过蛮力实验分析分布式平台上各种方案的计算节点数量和通信成本之间的权衡可能会非常昂贵。因此,我们的性能估计模型提供了一种经济的方法来执行给定分区和通信方案作为输入的分析。我们在本地HPC集群和云托管的NSF变色龙集群上验证了我们的模型。使用我们的估计和实际测量,我们比较了通信方案,并提供了一种方案应该优于其他方案的条件。
{"title":"Estimating the Impact of Communication Schemes for Distributed Graph Processing","authors":"Tian Ye, S. Kuppannagari, C. Rose, Sasindu Wijeratne, R. Kannan, V. Prasanna","doi":"10.1109/ISPDC55340.2022.00016","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00016","url":null,"abstract":"Extreme scale graph analytics is imperative for several real-world Big Data applications with the underlying graph structure containing millions or billions of vertices and edges. Since such huge graphs cannot fit into the memory of a single computer, distributed processing of the graph is required. Several frameworks have been developed for performing graph processing on distributed systems. The frameworks focus primarily on choosing the right computation model and the partitioning scheme under the assumption that such design choices will automatically reduce the communication overheads. For any computational model and partitioning scheme, communication schemes — the data to be communicated and the virtual interconnection network among the nodes — have significant impact on the performance. To analyze this impact, in this work, we identify widely used communication schemes and estimate their performance. Analyzing the trade-offs between the number of compute nodes and communication costs of various schemes on a distributed platform by brute force experimentation can be prohibitively expensive. Thus, our performance estimation models provide an economic way to perform the analyses given the partitions and the communication scheme as input. We validate our model on a local HPC cluster as well as the cloud hosted NSF Chameleon cluster. Using our estimates as well as the actual measurements, we compare the communication schemes and provide conditions under which one scheme should be preferred over the others.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114227248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sponsors and Conference Support 赞助商和会议支持
{"title":"Sponsors and Conference Support","authors":"","doi":"10.1109/icbake.2013.76","DOIUrl":"https://doi.org/10.1109/icbake.2013.76","url":null,"abstract":"","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114613348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Comparison of Speculative Taskloop and OpenMP-for-Loop Thread-Level Speculation on Hardware Transactional Memory 推测型Taskloop和OpenMP-for-Loop在硬件事务性内存上线程级推测的性能比较
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00021
Juan Salamanca
Speculative Taskloop (STL) is a loop parallelization technique that takes the best of Task-based Parallelism and Thread-Level Speculation to speed up loops with may loop-carried dependencies that were previously difficult for compilers to parallelize. Previous studies show the efficiency of STL when implemented using Hardware Transactional Memory and the advantages it offers compared to a typical DOACROSS technique such as OpenMP ordered. This paper presents a performance comparison between STL and a previously proposed technique that implements Thread-Level Speculation (TLS) in the for worksharing construct (FOR-TLS) over a set of loops from cbench and SPEC2006 benchmarks. The results show interesting insights on how each technique can be more appropriate depending on the characteristics of the evaluated loop. Experimental results reveal that by implementing both techniques on top of HTM, speed-ups of up to 2.41× can be obtained for STL and up to 2× for FOR-TLS.
STL (Speculative Taskloop)是一种循环并行化技术,它充分利用了基于任务的并行性和线程级的推测来加速循环,这些循环携带的依赖关系以前很难被编译器并行化。以前的研究表明,使用硬件事务性内存实现STL的效率,以及与典型的DOACROSS技术(如OpenMP命令)相比,它提供的优势。本文介绍了STL和先前提出的一种技术之间的性能比较,该技术在工作共享结构(for -TLS)中通过一组来自cbench和SPEC2006基准测试的循环实现线程级推测(TLS)。结果显示了一些有趣的见解,说明每种技术如何根据被评估循环的特征更合适。实验结果表明,通过在HTM上实现这两种技术,STL可以获得高达2.41倍的加速,for - tls可以获得高达2倍的加速。
{"title":"Performance Comparison of Speculative Taskloop and OpenMP-for-Loop Thread-Level Speculation on Hardware Transactional Memory","authors":"Juan Salamanca","doi":"10.1109/ISPDC55340.2022.00021","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00021","url":null,"abstract":"Speculative Taskloop (STL) is a loop parallelization technique that takes the best of Task-based Parallelism and Thread-Level Speculation to speed up loops with may loop-carried dependencies that were previously difficult for compilers to parallelize. Previous studies show the efficiency of STL when implemented using Hardware Transactional Memory and the advantages it offers compared to a typical DOACROSS technique such as OpenMP ordered. This paper presents a performance comparison between STL and a previously proposed technique that implements Thread-Level Speculation (TLS) in the for worksharing construct (FOR-TLS) over a set of loops from cbench and SPEC2006 benchmarks. The results show interesting insights on how each technique can be more appropriate depending on the characteristics of the evaluated loop. Experimental results reveal that by implementing both techniques on top of HTM, speed-ups of up to 2.41× can be obtained for STL and up to 2× for FOR-TLS.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115793756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
[Full] Deep Heuristic for Broadcasting in Arbitrary Networks [完整]基于深度启发式的任意网络广播
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00010
Hovhannes A. Harutyunyan, Narek A. Hovhannisyan, Rakshit Magithiya
Broadcasting is an information dissemination problem in a connected graph in which one vertex, called the originator, must distribute a message to all other vertices by placing a series of calls along the edges of the graph. Every time the informed vertices aid the originator in distributing the message. Finding the broadcast time of any vertex in an arbitrary graph is NP-complete. We designed an efficient heuristic, which improves the results of existing heuristics in most cases. Extensive simulations show that our new heuristic outperforms the existing ones for most of the commonly used interconnection networks in some network models generated by network simulator ns-2.
广播是连通图中的一个信息传播问题,其中一个被称为始发者的顶点必须通过沿着图的边缘放置一系列调用来将消息分发给所有其他顶点。每次被告知的顶点都帮助发送者分发消息。求任意图中任意顶点的广播时间是np完全的。我们设计了一种高效的启发式算法,在大多数情况下改进了现有启发式算法的结果。大量的仿真表明,在网络模拟器ns-2生成的一些网络模型中,对于大多数常用的互连网络,我们的启发式算法优于现有的启发式算法。
{"title":"[Full] Deep Heuristic for Broadcasting in Arbitrary Networks","authors":"Hovhannes A. Harutyunyan, Narek A. Hovhannisyan, Rakshit Magithiya","doi":"10.1109/ISPDC55340.2022.00010","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00010","url":null,"abstract":"Broadcasting is an information dissemination problem in a connected graph in which one vertex, called the originator, must distribute a message to all other vertices by placing a series of calls along the edges of the graph. Every time the informed vertices aid the originator in distributing the message. Finding the broadcast time of any vertex in an arbitrary graph is NP-complete. We designed an efficient heuristic, which improves the results of existing heuristics in most cases. Extensive simulations show that our new heuristic outperforms the existing ones for most of the commonly used interconnection networks in some network models generated by network simulator ns-2.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117303018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A type system to avoid runtime errors for Multi-ML 避免Multi-ML运行时错误的类型系统
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00015
F. Gava, V. Allombert, J. Tesson
Programming parallel architectures using a hierarchical point of view is becoming today’s standard as machines are structured by multiple layers of memories. To handle such architectures, we focus on the MULTI-BSP bridging model. This model extends BSP and proposes a structured way of programming multi-level architectures. In the context of parallel programming we, now need to manage new concerns such as memory coherency, deadlocks and safe data communications. To do so, we propose a typing system for MULTI-ML, a ML-like programming language based on the MULTI-BSP model. This type system introduces data locality using type annotations and effects to be able to detected wrong uses of multi-level architectures. We thus ensure that "Well-typed programs cannot go wrong" on hierarchical architectures.
由于机器是由多层存储器构成的,使用分层观点对并行架构进行编程正在成为当今的标准。为了处理这样的体系结构,我们关注MULTI-BSP桥接模型。该模型对BSP进行了扩展,提出了一种结构化的多级体系结构编程方法。在并行编程的背景下,我们现在需要管理新的关注点,如内存一致性、死锁和安全数据通信。为此,我们提出了一种基于MULTI-BSP模型的类ml编程语言MULTI-ML的类型系统。该类型系统使用类型注释和效果引入数据局部性,以便能够检测对多级体系结构的错误使用。因此,我们确保在分层体系结构上“类型良好的程序不会出错”。
{"title":"A type system to avoid runtime errors for Multi-ML","authors":"F. Gava, V. Allombert, J. Tesson","doi":"10.1109/ISPDC55340.2022.00015","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00015","url":null,"abstract":"Programming parallel architectures using a hierarchical point of view is becoming today’s standard as machines are structured by multiple layers of memories. To handle such architectures, we focus on the MULTI-BSP bridging model. This model extends BSP and proposes a structured way of programming multi-level architectures. In the context of parallel programming we, now need to manage new concerns such as memory coherency, deadlocks and safe data communications. To do so, we propose a typing system for MULTI-ML, a ML-like programming language based on the MULTI-BSP model. This type system introduces data locality using type annotations and effects to be able to detected wrong uses of multi-level architectures. We thus ensure that \"Well-typed programs cannot go wrong\" on hierarchical architectures.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126354049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Modeling of Scalable Resource Allocations with the Imperial PEPA Compiler 使用Imperial PEPA编译器对可伸缩资源分配进行性能建模
Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00023
W. Sanders, Srishti Srivastava, I. Banicescu
Advances in computational resources have led to corresponding increases in the scale of large parallel and distributed computer (PDC) systems. With these increases in scale, it becomes increasingly important to understand how these systems will perform as they scale when they are planned and defined, rather than post deployment. Modeling and simulation of these systems can be used to identify unexpected problems and bottlenecks, verify operational functionality, and can result in significant cost savings and avoidance if done prior to the often large capital expenditures that accompany major parallel and distributed computer system deployments. In this paper, we evaluate how PDC systems perform while they are subject to increases in both the number of applications and the number of machines. We generate 42,000 models and evaluate them with the Imperial PEPA Compiler to determine the scaling effects across both an increasing number of applications and an increasing number of machines. These results are then utilized to develop a heuristic for predicting the makespan time for sets of applications mapped onto a number of machines where the applications are subjected to perturbations at runtime. While in the current work the estimated application rates and perturbed rates considered are based on the uniform probability distribution, future work will include a wider range of probability distributions for these rates.
计算资源的进步导致了大型并行和分布式计算机(PDC)系统规模的相应增加。随着规模的增加,了解这些系统在计划和定义时(而不是在部署后)进行规模扩展时将如何执行变得越来越重要。这些系统的建模和仿真可用于识别意外问题和瓶颈,验证操作功能,并且如果在伴随主要并行和分布式计算机系统部署的通常大量资本支出之前完成,则可以显著节省成本和避免成本。在本文中,我们评估了PDC系统在应用数量和机器数量增加的情况下的性能。我们生成了42,000个模型,并使用Imperial PEPA Compiler对它们进行评估,以确定在越来越多的应用程序和越来越多的机器上的缩放效果。然后利用这些结果开发一种启发式方法,用于预测映射到许多机器上的应用程序集的makespan时间,这些机器上的应用程序在运行时受到干扰。虽然在目前的工作中,考虑的估计应用率和摄动率是基于均匀概率分布,但未来的工作将包括这些率的更大范围的概率分布。
{"title":"Performance Modeling of Scalable Resource Allocations with the Imperial PEPA Compiler","authors":"W. Sanders, Srishti Srivastava, I. Banicescu","doi":"10.1109/ISPDC55340.2022.00023","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00023","url":null,"abstract":"Advances in computational resources have led to corresponding increases in the scale of large parallel and distributed computer (PDC) systems. With these increases in scale, it becomes increasingly important to understand how these systems will perform as they scale when they are planned and defined, rather than post deployment. Modeling and simulation of these systems can be used to identify unexpected problems and bottlenecks, verify operational functionality, and can result in significant cost savings and avoidance if done prior to the often large capital expenditures that accompany major parallel and distributed computer system deployments. In this paper, we evaluate how PDC systems perform while they are subject to increases in both the number of applications and the number of machines. We generate 42,000 models and evaluate them with the Imperial PEPA Compiler to determine the scaling effects across both an increasing number of applications and an increasing number of machines. These results are then utilized to develop a heuristic for predicting the makespan time for sets of applications mapped onto a number of machines where the applications are subjected to perturbations at runtime. While in the current work the estimated application rates and perturbed rates considered are based on the uniform probability distribution, future work will include a wider range of probability distributions for these rates.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128244532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1