2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

英文中文

A System for High Performance Mining on GDELT Data GDELT数据的高性能挖掘系统

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00182

Konstantin Pogorelov, Daniel Thilo Schroeder, Petra Filkuková, J. Langguth

We design a system for efficient in-memory analysis of data from the GDELT database of news events. The specialization of the system allows us to avoid the inefficiencies of existing alternatives, and make full use of modern parallel high-performance computing hardware. We then present a series of experiments showcasing the system’s ability to analyze correlations in the entire GDELT 2.0 database containing more than a billion news items. The results reveal large scale trends in the world of today’s online news.

我们设计了一个系统，可以有效地在内存中分析来自GDELT数据库的新闻事件数据。该系统的专业化使我们能够避免现有替代方案的低效率，并充分利用现代并行高性能计算硬件。然后，我们展示了一系列实验，展示了系统在包含超过10亿个新闻条目的整个GDELT 2.0数据库中分析相关性的能力。研究结果揭示了当今世界在线新闻的大趋势。

引用次数: 4

An incremental GraphBLAS solution for the 2018 TTC Social Media case study 2018年TTC社交媒体案例研究的增量GraphBLAS解决方案

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00045

Márton Elekes, Gábor Szárnyas

Graphs are increasingly important for modelling and analysing connected data sets. Traditionally, graph analytical tools targeted global fixed-point computations, while graph databases focused on simpler transactional read operations such as retrieving the neighbours of a node. However, recent applications of graph processing (such as financial fraud detection and serving personalized recommendations) often necessitate a mix of the two workload profiles. A potential approach to tackle these complex workloads is to formulate graph algorithms in the language of linear algebra. To this end, the recent GraphBLAS standard defines a linear algebraic graph computational model and an API for implementing such algorithms. To investigate its usability and efficiency, we have implemented a GraphBLAS solution for the “Social Media” case study of the 2018 Transformation Tool Contest. This paper presents our solution along with an incrementalized variant to improve its runtime for repeated evaluations. Preliminary results show that the GraphBLAS-based solution is competitive but implementing it requires significant development efforts.

图对于建模和分析连接数据集越来越重要。传统上，图分析工具针对全局定点计算，而图数据库侧重于更简单的事务性读取操作，如检索节点的邻居。然而，最近的图形处理应用(如金融欺诈检测和个性化推荐服务)通常需要混合使用这两种工作负载配置文件。解决这些复杂工作负载的一个潜在方法是用线性代数的语言制定图算法。为此，最近的GraphBLAS标准定义了一个线性代数图计算模型和实现这种算法的API。为了研究其可用性和效率，我们为2018年转型工具竞赛的“社交媒体”案例研究实施了GraphBLAS解决方案。本文提出了我们的解决方案以及一个增量化的变体，以改进其重复计算的运行时间。初步结果表明，基于graphblas的解决方案具有竞争力，但是实现它需要大量的开发工作。

{"title":"An incremental GraphBLAS solution for the 2018 TTC Social Media case study","authors":"Márton Elekes, Gábor Szárnyas","doi":"10.1109/IPDPSW50202.2020.00045","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00045","url":null,"abstract":"Graphs are increasingly important for modelling and analysing connected data sets. Traditionally, graph analytical tools targeted global fixed-point computations, while graph databases focused on simpler transactional read operations such as retrieving the neighbours of a node. However, recent applications of graph processing (such as financial fraud detection and serving personalized recommendations) often necessitate a mix of the two workload profiles. A potential approach to tackle these complex workloads is to formulate graph algorithms in the language of linear algebra. To this end, the recent GraphBLAS standard defines a linear algebraic graph computational model and an API for implementing such algorithms. To investigate its usability and efficiency, we have implemented a GraphBLAS solution for the “Social Media” case study of the 2018 Transformation Tool Contest. This paper presents our solution along with an incrementalized variant to improve its runtime for repeated evaluations. Preliminary results show that the GraphBLAS-based solution is competitive but implementing it requires significant development efforts.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124438043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

IPDPSW 2020 TOC

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00004

Dionysios Diamantopoulos, Mitra Purandare, H. Englund, Laurent Gantel

引用次数: 0

A Tropical Semiring Multiple Matrix-Product Library on GPUs: (not just) a step towards RNA-RNA Interaction Computations gpu上的热带半环多矩阵乘积库:(不只是)迈向RNA-RNA相互作用计算的一步

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00037

Brandon Gildemaster, P. Ghalsasi, S. Rajopadhye

RNA-RNA interaction (RRI) is important in processes such as gene regulation, and certain classes of RRI are known to play roles in various diseases including cancer and Alzheimer’s. Other classes are not as well studied but could have biological importance, thus there is a need for highthroughput tools which enable the study of these molecules. Current computational tools for RRI are slow: execution times in days, weeks or even months for large experiments, because the algorithms have time and space complexity, respectively $mathrm {O}( N^{3}M^{3})$ and $mathrm {O}( N^{2}M^{2})$, for two sequences length $N$ and $M$. No GPU parallelization of such algorithms exists. We show how the most computationally expensive portion of RRI base pair maximization algorithms, an $mathrm {O}( NM ) ^{3}$ computation, can be expressed as $mathrm {O}( N^{3})$ instances of such matrix products. We therefore propose an optimized library for the core computation of BPMax, an RRI algorithm based on weighted base pair counting. Our library multiplies multiple pairs of matrices in the max-plus semiring. We explore multiple tradeoffs: a square matrix product library attains close to the machine peak, but does 6-fold unnecessary computations and has $mathrm {a}2 times $ higher data footprint, while the one with the minimum work and memory footprint has thread divergence and unbalanced load. We also specialize for upper banded (trapezoidal shaped) matrices, which are relevant to a windowed version of the algorithm. STOP PRESS: just before we submitted the camera-ready version of the paper, we incorporated our library into a GPU implementation of the complete BPMax algorithm. We will report performance numbers at the workshop.

RNA-RNA相互作用(RRI)在基因调控等过程中很重要，已知某些类型的RRI在包括癌症和阿尔茨海默病在内的各种疾病中发挥作用。其他种类的分子没有得到很好的研究，但可能具有生物学意义，因此需要高通量工具来研究这些分子。目前用于RRI的计算工具很慢:大型实验的执行时间为几天，几周甚至几个月，因为算法具有时间和空间复杂性，分别为$mathrm {O}(N^{3}M^{3})$和$mathrm {O}(N^{2}M^{2})$，对于两个序列长度$N$和$M$。这种算法不存在GPU并行化。我们展示了RRI碱基对最大化算法中计算成本最高的部分，即$ mathm {O}(NM) ^{3}$计算，如何可以表示为$ mathm {O}(N^{3})$这种矩阵乘积的实例。因此，我们提出了一个优化库，用于BPMax的核心计算，这是一种基于加权碱基对计数的RRI算法。我们的库在max-plus半环中对多个矩阵对进行乘法运算。我们探索了多种权衡:一个方矩阵乘积库接近机器峰值，但做了6倍的不必要计算，并且有2倍的数据占用，而具有最小工作和内存占用的库有线程发散和负载不平衡。我们还专门研究上带状(梯形)矩阵，这与算法的窗口版本相关。STOP PRESS:就在我们提交论文的相机准备版本之前，我们将我们的库合并到完整BPMax算法的GPU实现中。我们将在研讨会上报告业绩数字。

{"title":"A Tropical Semiring Multiple Matrix-Product Library on GPUs: (not just) a step towards RNA-RNA Interaction Computations","authors":"Brandon Gildemaster, P. Ghalsasi, S. Rajopadhye","doi":"10.1109/IPDPSW50202.2020.00037","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00037","url":null,"abstract":"RNA-RNA interaction (RRI) is important in processes such as gene regulation, and certain classes of RRI are known to play roles in various diseases including cancer and Alzheimer’s. Other classes are not as well studied but could have biological importance, thus there is a need for highthroughput tools which enable the study of these molecules. Current computational tools for RRI are slow: execution times in days, weeks or even months for large experiments, because the algorithms have time and space complexity, respectively $mathrm {O}( N^{3}M^{3})$ and $mathrm {O}( N^{2}M^{2})$, for two sequences length $N$ and $M$. No GPU parallelization of such algorithms exists. We show how the most computationally expensive portion of RRI base pair maximization algorithms, an $mathrm {O}( NM ) ^{3}$ computation, can be expressed as $mathrm {O}( N^{3})$ instances of such matrix products. We therefore propose an optimized library for the core computation of BPMax, an RRI algorithm based on weighted base pair counting. Our library multiplies multiple pairs of matrices in the max-plus semiring. We explore multiple tradeoffs: a square matrix product library attains close to the machine peak, but does 6-fold unnecessary computations and has $mathrm {a}2 times $ higher data footprint, while the one with the minimum work and memory footprint has thread divergence and unbalanced load. We also specialize for upper banded (trapezoidal shaped) matrices, which are relevant to a windowed version of the algorithm. STOP PRESS: just before we submitted the camera-ready version of the paper, we incorporated our library into a GPU implementation of the complete BPMax algorithm. We will report performance numbers at the workshop.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117061899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Computing Hypergraph Homology in Chapel 计算Chapel中的超图同调

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00112

J. Firoz, Louis Jenkins, C. Joslyn, Brenda Praggastis, Emilie Purvine, Mark Raugas

In this paper, we discuss our experience in implementing homology computation, in particular the Betti number calculations in Chapel hypergraph Library (CHGL). Given a dataset represented as a hypergraph, a Betti number for a particular dimension k indicates how many k-dimensional ‘voids’ are present in the dataset. Computing the Betti numbers involve various array-centric and linear algebra operations. We demonstrate that implementing these operations in Chapel is both concise and intuitive. In addition, we show that Chapel provides language constructs for implementing parallel and distributed execution of the linear algebra kernels with minimal effort. Syntactically, Chapel provides succinctness of Python, while delivering comparable and better performance than C++-based and Julia-based packages for calculating the Betti numbers respectively.

本文讨论了我们在实现同调计算方面的经验，特别是在Chapel超图库(CHGL)中的Betti数计算。给定一个用超图表示的数据集，特定维度k的Betti数表示数据集中存在多少个k维“空洞”。计算贝蒂数涉及各种以数组为中心和线性代数操作。我们证明在Chapel中实现这些操作既简洁又直观。此外，我们还展示了Chapel提供了以最小的努力实现并行和分布式执行线性代数内核的语言结构。在语法上，Chapel提供了Python的简洁性，同时提供了比基于c++和基于julia的包更好的性能来分别计算Betti数。

引用次数: 0

A Framework for the Evaluation of Parallel and Distributed Computing Educational Resources 并行与分布式计算教育资源评估框架

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00057

David W. Brown, Vitaly Ford, S. Ghafoor

This paper proposes a classification scheme for categorization of PDC educational resources. We have also proposed an evaluation framework for assessing the PDC resources. Under the proposed framework, each resource type has a set of criteria and an associated score. A PDC resource will obtain a score if evaluated under our proposed framework that is the sum of the scores of the criteria that the resource satisfies. The evaluation of whether a resource met a criterion is subjective. We have also presented our evaluation of PDC educational resources appropriate for CS1, CS2 (Computer Science 1 and 2), and DS/A (Data Structures and Algorithms) available on the web using our proposed framework.

本文提出了一种PDC教育资源分类方案。我们还提出了一个PDC资源评估框架。在提议的框架下，每种资源类型都有一套标准和一个相关的分数。如果在我们提出的框架下进行评估，PDC资源将获得一个分数，该分数是资源满足的标准分数的总和。对资源是否满足标准的评价是主观的。我们还使用我们提出的框架对适合CS1、CS2(计算机科学1和2)和DS/A(数据结构和算法)的PDC教育资源进行了评估。

引用次数: 4

Workshop 10: APDCM Advances in Parallel and Distributed Computational Models 研讨会10:并行和分布式计算模型中的APDCM进展

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00095

J. Bordim, K. Nakano, Susumu Matsumae, M. Shibata

The past thirty years have seen a flurry of activity in the area of parallel and distributed computing. In recent years, novel parallel and distributed computational models have been proposed in the literature, reflecting advances in new computational devices and environments such as optical interconnects, programmable logic arrays, networks of workstations, radio communications, mobile computing, DNA computing, quantum computing, sensor networks etc. It is very encouraging to note that the advent of these new models has led to significant advances in the resolution of various difficult problems of practical interest.

在过去的三十年里，并行和分布式计算领域掀起了一阵热潮。近年来，文献中提出了新的并行和分布式计算模型，反映了新的计算设备和环境的进步，如光互连、可编程逻辑阵列、工作站网络、无线电通信、移动计算、DNA计算、量子计算、传感器网络等。令人十分鼓舞的是，这些新模式的出现已导致在解决实际关心的各种困难问题方面取得重大进展。

引用次数: 0

Workshop 9: PDCO Parallel / Distributed Combinatorics and Optimization 研讨会9:PDCO并行/分布式组合与优化

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00089

Grégoire Danoy, D. E. Baz, V. Boyer, B. Dorronsoro, L. Yang, Keqin Li

The IEEE Workshop on Parallel / Distributed Combinatorics and Optimization aims at providing a forum for scientific researchers and engineers on recent advances in the field of parallel or distributed computing for difficult combinatorial optimization problems, like 0–1 multidimensional knapsack problems, cutting stock problems, scheduling problems, large scale linear programming problems, nonlinear optimization problems and global optimization problems. Emphasis is placed on new techniques for the solution of these difficult problems like cooperative methods for integer programming problems. Techniques based on metaheuristics and nature-inspired paradigms are considered. Aspects related to Combinatorial Scientific Computing (CSC) are considered. In particular, we solicit submissions of original manuscripts on sparse matrix computations, graph algorithm and original parallel or distributed algorithms. The use of new approaches in parallel and distributed computing like GPU, MIC, FPGA, volunteer computing are considered. Application to cloud computing, planning, logistics, manufacturing, finance, telecommunications and computational biology are considered.

IEEE并行/分布式组合与优化研讨会旨在为科学研究人员和工程师提供一个论坛，讨论并行或分布式计算领域的最新进展，以解决困难的组合优化问题，如0-1多维背包问题，削减库存问题，调度问题，大规模线性规划问题，非线性优化问题和全局优化问题。重点放在解决这些难题的新技术，如整数规划问题的合作方法。考虑了基于元启发式和自然启发范式的技术。考虑了与组合科学计算(CSC)相关的方面。我们特别邀请有关稀疏矩阵计算、图算法、并行或分布式算法的原创稿件投稿。考虑了GPU、MIC、FPGA、志愿计算等并行和分布式计算新方法的使用。应用于云计算，规划，物流，制造，金融，电信和计算生物学。

{"title":"Workshop 9: PDCO Parallel / Distributed Combinatorics and Optimization","authors":"Grégoire Danoy, D. E. Baz, V. Boyer, B. Dorronsoro, L. Yang, Keqin Li","doi":"10.1109/ipdpsw50202.2020.00089","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00089","url":null,"abstract":"The IEEE Workshop on Parallel / Distributed Combinatorics and Optimization aims at providing a forum for scientific researchers and engineers on recent advances in the field of parallel or distributed computing for difficult combinatorial optimization problems, like 0–1 multidimensional knapsack problems, cutting stock problems, scheduling problems, large scale linear programming problems, nonlinear optimization problems and global optimization problems. Emphasis is placed on new techniques for the solution of these difficult problems like cooperative methods for integer programming problems. Techniques based on metaheuristics and nature-inspired paradigms are considered. Aspects related to Combinatorial Scientific Computing (CSC) are considered. In particular, we solicit submissions of original manuscripts on sparse matrix computations, graph algorithm and original parallel or distributed algorithms. The use of new approaches in parallel and distributed computing like GPU, MIC, FPGA, volunteer computing are considered. Application to cloud computing, planning, logistics, manufacturing, finance, telecommunications and computational biology are considered.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130516453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tangle Ledger for Decentralized Learning 分布式学习的缠结分类账

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00144

R. Schmid, Bjarne Pfitzner, Jossekin Beilharz, B. Arnrich, A. Polze

Federated learning has the potential to make machine learning applicable to highly privacy-sensitive domains and distributed datasets. In some scenarios, however, a central server for aggregating the partial learning results is not available. In fully decentralized learning, a network of peer-to-peer nodes collaborates to form a consensus on a global model without a trusted aggregating party. Often, the network consists of Internet of Things (IoT) and Edge computing nodes.Previous approaches for decentralized learning map the gradient batching and averaging algorithm from traditional federated learning to blockchain architectures. In an open network of participating nodes, the threat of adversarial nodes introducing poisoned models into the network increases compared to a federated learning scenario which is controlled by a single authority. Hence, the decentralized architecture must additionally include a machine learning-aware fault tolerance mechanism to address the increased attack surface.We propose a tangle architecture for decentralized learning, where the validity of model updates is checked as part of the basic consensus. We provide an experimental evaluation of the proposed architecture, showing that it performs well in both model convergence and model poisoning protection.

联邦学习有可能使机器学习适用于高度隐私敏感的领域和分布式数据集。然而，在某些情况下，聚合部分学习结果的中央服务器不可用。在完全去中心化的学习中，一个由点对点节点组成的网络在没有可信聚合方的情况下就全球模型形成共识。通常，网络由物联网(IoT)和边缘计算节点组成。以前的去中心化学习方法将梯度批处理和平均算法从传统的联邦学习映射到区块链架构。在参与节点的开放网络中，与由单个权威控制的联邦学习场景相比，敌对节点将有毒模型引入网络的威胁增加了。因此，去中心化架构必须另外包括一个机器学习感知的容错机制，以应对不断增加的攻击面。我们提出了一种分散学习的纠结架构，其中模型更新的有效性作为基本共识的一部分进行检查。我们对所提出的架构进行了实验评估，表明它在模型收敛和模型中毒保护方面都有很好的表现。

{"title":"Tangle Ledger for Decentralized Learning","authors":"R. Schmid, Bjarne Pfitzner, Jossekin Beilharz, B. Arnrich, A. Polze","doi":"10.1109/IPDPSW50202.2020.00144","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00144","url":null,"abstract":"Federated learning has the potential to make machine learning applicable to highly privacy-sensitive domains and distributed datasets. In some scenarios, however, a central server for aggregating the partial learning results is not available. In fully decentralized learning, a network of peer-to-peer nodes collaborates to form a consensus on a global model without a trusted aggregating party. Often, the network consists of Internet of Things (IoT) and Edge computing nodes.Previous approaches for decentralized learning map the gradient batching and averaging algorithm from traditional federated learning to blockchain architectures. In an open network of participating nodes, the threat of adversarial nodes introducing poisoned models into the network increases compared to a federated learning scenario which is controlled by a single authority. Hence, the decentralized architecture must additionally include a machine learning-aware fault tolerance mechanism to address the increased attack surface.We propose a tangle architecture for decentralized learning, where the validity of model updates is checked as part of the basic consensus. We provide an experimental evaluation of the proposed architecture, showing that it performs well in both model convergence and model poisoning protection.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121310173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

The Case for Explicit Reuse Semantics for RDMA Communication RDMA通信中显式重用语义的案例

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00148

Scott Levy, Patrick M. Widener, C. Ulmer, T. Kordenbrock

Remote Direct Memory Access (RDMA) is an increasingly important technology in high-performance computing (HPC). RDMA provides low-latency, high-bandwidth data transfer between compute nodes. Additionally, it does not require explicit synchronization with the destination processor. Eliminating unnecessary synchronization can significantly improve the communication performance of large-scale scientific codes. A long-standing challenge presented by RDMA communication is mitigating the cost of registering memory with the network interface controller (NIC). Reusing memory once it is registered has been shown to significantly reduce the cost of RDMA communication. However, existing approaches for reusing memory rely on implicit memory semantics. In this paper, we introduce an approach that makes memory reuse semantics explicit by exposing a separate allocator for registered memory. The data and analysis in this paper yield the following contributions: (i) managing registered memory explicitly enables efficient reuse of registered memory; (ii) registering large memory regions to amortize the registration cost over multiple user requests can significantly reduce cost of acquiring new registered memory; and (iii) reducing the cost of acquiring registered memory can significantly improve the performance of RDMA communication. Reusing registered memory is key to high-performance RDMA communication. By making reuse semantics explicit, our approach has the potential to improve RDMA performance by making it significantly easier for programmers to efficiently reuse registered memory.

RDMA (Remote Direct Memory Access)是高性能计算(HPC)领域一项日益重要的技术。RDMA在计算节点之间提供低延迟、高带宽的数据传输。此外，它不需要与目标处理器显式同步。消除不必要的同步可以显著提高大规模科学码的通信性能。RDMA通信带来的一个长期挑战是降低向网络接口控制器(NIC)注册内存的成本。一旦内存被注册，重用内存可以显著降低RDMA通信的成本。然而，现有的内存重用方法依赖于隐式内存语义。在本文中，我们介绍了一种方法，通过为注册内存公开一个单独的分配器，使内存重用语义显式。本文的数据和分析产生了以下贡献:(i)显式管理注册内存可以有效地重用注册内存;(ii)注册大内存区域以分摊多个用户请求的注册成本可以显著降低获取新注册内存的成本;以及(iii)降低获取注册存储器的成本可以显著提高RDMA通信的性能。重用注册内存是实现高性能RDMA通信的关键。通过使重用语义显式，我们的方法有可能通过使程序员更容易有效地重用注册内存来提高RDMA性能。

{"title":"The Case for Explicit Reuse Semantics for RDMA Communication","authors":"Scott Levy, Patrick M. Widener, C. Ulmer, T. Kordenbrock","doi":"10.1109/IPDPSW50202.2020.00148","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00148","url":null,"abstract":"Remote Direct Memory Access (RDMA) is an increasingly important technology in high-performance computing (HPC). RDMA provides low-latency, high-bandwidth data transfer between compute nodes. Additionally, it does not require explicit synchronization with the destination processor. Eliminating unnecessary synchronization can significantly improve the communication performance of large-scale scientific codes. A long-standing challenge presented by RDMA communication is mitigating the cost of registering memory with the network interface controller (NIC). Reusing memory once it is registered has been shown to significantly reduce the cost of RDMA communication. However, existing approaches for reusing memory rely on implicit memory semantics. In this paper, we introduce an approach that makes memory reuse semantics explicit by exposing a separate allocator for registered memory. The data and analysis in this paper yield the following contributions: (i) managing registered memory explicitly enables efficient reuse of registered memory; (ii) registering large memory regions to amortize the registration cost over multiple user requests can significantly reduce cost of acquiring new registered memory; and (iii) reducing the cost of acquiring registered memory can significantly improve the performance of RDMA communication. Reusing registered memory is key to high-performance RDMA communication. By making reuse semantics explicit, our approach has the potential to improve RDMA performance by making it significantly easier for programmers to efficiently reuse registered memory.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116656177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀