2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)最新文献

英文中文

Performability Analysis of Mesh-Based NoCs Using Markov Reward Model 基于马尔可夫奖励模型的网格noc性能分析

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00102

Jie Hou, M. Radetzki

Technology scaling makes it possible to implement systems with hundreds of processing cores, and thousands in the future. The communication in such systems is enabled by Networks-on-Chips (NoCs). A downside of technology scaling is the increased susceptibility to failures in NoC resources. Ensuring reliable operation despite such failures degrades NoC performance and may even invalidate the performance benefits expected from scaling. Thus, it is not enough to analyze performance and reliability in isolation, as usually done. Instead, we suggest treating both aspects together using the concept of performability and its analysis with Markov reward models. Our methodology is exemplified for mesh NoCs and transient faults but can be transferred to other topologies and fault models. We investigate how performability develops with scaling towards larger NoCs and explore the limits of scaling by determining the break-even failure rates under which scaling can achieve net performance increase.

技术扩展使得实现具有数百个处理核心的系统成为可能，未来可能会有数千个处理核心。这种系统中的通信是由片上网络(noc)实现的。技术扩展的一个缺点是NoC资源对故障的敏感性增加。在此类故障的情况下确保可靠的运行会降低NoC的性能，甚至可能使预期的扩展带来的性能优势失效。因此，像通常那样单独分析性能和可靠性是不够的。相反，我们建议使用可执行性的概念及其与马尔可夫奖励模型的分析来同时处理这两个方面。我们的方法适用于网状noc和瞬态故障，但可以转移到其他拓扑和故障模型。我们研究了性能如何随着扩展到更大的noc而发展，并通过确定盈亏平衡故障率来探索扩展的限制，在这种情况下，扩展可以实现净性能提升。

引用次数: 2

Novel Application of Parallel Computing Techniques in Soft X-Rays Plasma Measurement Systems for the WEST Experimental Thermal Fusion Reactor 并行计算技术在WEST实验热聚变反应堆软x射线等离子体测量系统中的新应用

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00024

R. Krawczyk, P. Linczuk, A. Wojeński, K. Poźniak, G. Kasprowicz, Wojciech Zabolotny, M. Gąska, D. Mazon, A. Jardin, T. Czarski, P. Kolasiński, M. Chernyshova, E. Kowalska-Strzeciwilk, K. Malinowski

The article presents results of the novel approach of combining high-performance and parallel computing solutions with front-end electronics in the development of scalable specialized soft X-rays measurement tool for high-scale plasma physics experiments with thermal fusion devices. Regarding the need for an easily-modifiable advanced diagnostics of tokamak hot plasma content, the heterogeneous system consisting of FPGAs and the PC server was introduced. The objective is to provide data quality monitoring and evaluation mechanisms along with an algorithm benchmarking tool for fast, low-latency measurements of soft X-rays emitted by hot tokamak plasma. The article describes a method of the development of the computation pipeline on the server side. The novel parallel algorithms and results are discussed. This brand new approach is targeted to adapt a HPC techniques in new areas of science, where comprehensive low-latency measurements and instrumentation are increasingly desired. The presented solution is deployed in the operational tokamak WEST (Tungsten Environment in Steady-State Tokamak) in collaboration with French Alternative Energies and Atomic Energy Commission (CEA), Cadarache, France.

本文介绍了将高性能和并行计算解决方案与前端电子技术相结合的新方法的结果，该方法用于开发可扩展的专用软x射线测量工具，用于热聚变装置的大规模等离子体物理实验。针对托卡马克热等离子体含量高级诊断的需要，介绍了由fpga和PC服务器组成的异构系统。目的是为热托卡马克等离子体发射的软x射线的快速、低延迟测量提供数据质量监测和评估机制以及算法基准测试工具。本文描述了一种在服务器端开发计算管道的方法。讨论了新的并行算法及其结果。这种全新的方法旨在将高性能计算技术应用于新的科学领域，在这些领域，越来越需要全面的低延迟测量和仪器。所提出的解决方案与法国替代能源和原子能委员会(CEA)合作，部署在位于法国Cadarache的运行托卡马克WEST(稳态托卡马克钨环境)中。

{"title":"Novel Application of Parallel Computing Techniques in Soft X-Rays Plasma Measurement Systems for the WEST Experimental Thermal Fusion Reactor","authors":"R. Krawczyk, P. Linczuk, A. Wojeński, K. Poźniak, G. Kasprowicz, Wojciech Zabolotny, M. Gąska, D. Mazon, A. Jardin, T. Czarski, P. Kolasiński, M. Chernyshova, E. Kowalska-Strzeciwilk, K. Malinowski","doi":"10.1109/PDP2018.2018.00024","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00024","url":null,"abstract":"The article presents results of the novel approach of combining high-performance and parallel computing solutions with front-end electronics in the development of scalable specialized soft X-rays measurement tool for high-scale plasma physics experiments with thermal fusion devices. Regarding the need for an easily-modifiable advanced diagnostics of tokamak hot plasma content, the heterogeneous system consisting of FPGAs and the PC server was introduced. The objective is to provide data quality monitoring and evaluation mechanisms along with an algorithm benchmarking tool for fast, low-latency measurements of soft X-rays emitted by hot tokamak plasma. The article describes a method of the development of the computation pipeline on the server side. The novel parallel algorithms and results are discussed. This brand new approach is targeted to adapt a HPC techniques in new areas of science, where comprehensive low-latency measurements and instrumentation are increasingly desired. The presented solution is deployed in the operational tokamak WEST (Tungsten Environment in Steady-State Tokamak) in collaboration with French Alternative Energies and Atomic Energy Commission (CEA), Cadarache, France.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127039400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Data-Layout Reorganization for an Efficient Intra-Node Assembly of a Spectral Finite-Element Method 面向高效节点内装配的谱有限元数据布局重组

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00043

Gauthier Sornet, S. Jubertie, F. Dupros, F. D. Martin, P. Thierry, Sébastien Limet

The Finite-Element Method (FEM) is routinely used to solve Partial Differential Equations (PDE) in various scientific domains. For seismic waves modeling, the Spectral Element Method (SEM), which is a specific formulation of the classical FEM approach, have gained significant attention for the last two decades. This is explained both from the very good numerical accuracy of this method and from the parallel performance of classical MPI-based implementations that scale up to several tens of thousands computing cores. Nevertheless, the trend for current processors with an increasing level of low-level parallelism requires significant efforts at the shared-memory level. One major bottleneck is coming from the standard FEM assembly phase that leads to significant amount of irregular memory accesses. This prevents any efficient automatic optimizations from the compiler for instance. In this paper, we extract a kernel from a spectral-element application dedicated to earthquake simulations in complex geological medium (EFISPEC code developed at BRGM, the French Geological Survey). We study the intra-node behavior and we propose different levels of optimization (data-layout, manual vectorization, multi-threading) to fully benefit from SIMD units and NUMA architectures. Experiments performed on Intel Broadwell architecture show that the proposed optimizations dramatically improve the intra-node performance of the mini-application. Moreover, our results show a good match with rooflines theoretical performance models. We believe that these optimizations are not specific to this mini-application and may be implemented in different SEM and FEM based solvers as well.

在许多科学领域中，有限元法(FEM)通常用于求解偏微分方程(PDE)。对于地震波的建模，谱元法(SEM)是经典有限元方法的一种特殊形式，在过去的二十年中得到了广泛的关注。这可以从该方法非常好的数值精度和基于mpi的经典实现的并行性能(扩展到数万个计算核心)来解释。然而，当前处理器的底层并行性越来越高，这一趋势需要在共享内存级别上做出重大努力。一个主要的瓶颈来自于标准FEM组装阶段，它会导致大量的不规则内存访问。例如，这阻止了编译器进行任何有效的自动优化。在本文中，我们从一个专门用于复杂地质介质中地震模拟的谱元应用程序(EFISPEC代码由法国地质调查局BRGM开发)中提取了一个内核。我们研究了节点内行为，并提出了不同级别的优化(数据布局，手动矢量化，多线程)，以充分受益于SIMD单元和NUMA架构。在Intel Broadwell架构上进行的实验表明，所提出的优化方案显著提高了小型应用程序的节点内性能。此外，我们的研究结果与屋顶线理论性能模型吻合良好。我们相信这些优化并不是特定于这个小应用程序的，也可以在不同的基于SEM和FEM的求解器中实现。

{"title":"Data-Layout Reorganization for an Efficient Intra-Node Assembly of a Spectral Finite-Element Method","authors":"Gauthier Sornet, S. Jubertie, F. Dupros, F. D. Martin, P. Thierry, Sébastien Limet","doi":"10.1109/PDP2018.2018.00043","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00043","url":null,"abstract":"The Finite-Element Method (FEM) is routinely used to solve Partial Differential Equations (PDE) in various scientific domains. For seismic waves modeling, the Spectral Element Method (SEM), which is a specific formulation of the classical FEM approach, have gained significant attention for the last two decades. This is explained both from the very good numerical accuracy of this method and from the parallel performance of classical MPI-based implementations that scale up to several tens of thousands computing cores. Nevertheless, the trend for current processors with an increasing level of low-level parallelism requires significant efforts at the shared-memory level. One major bottleneck is coming from the standard FEM assembly phase that leads to significant amount of irregular memory accesses. This prevents any efficient automatic optimizations from the compiler for instance. In this paper, we extract a kernel from a spectral-element application dedicated to earthquake simulations in complex geological medium (EFISPEC code developed at BRGM, the French Geological Survey). We study the intra-node behavior and we propose different levels of optimization (data-layout, manual vectorization, multi-threading) to fully benefit from SIMD units and NUMA architectures. Experiments performed on Intel Broadwell architecture show that the proposed optimizations dramatically improve the intra-node performance of the mini-application. Moreover, our results show a good match with rooflines theoretical performance models. We believe that these optimizations are not specific to this mini-application and may be implemented in different SEM and FEM based solvers as well.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130987048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Getmewhere: A Location-Based Privacy-Preserving Information Service Getmewhere:基于位置的隐私保护信息服务

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00089

G. Bella, Francesco Marino, Gianpiero Costantino, F. Martinelli

Mobile users have got used to getting useful information while they are literally on the move. An implication of this habit is that certain live information, such as that for navigation, for dating and for handling emergencies, should be tailored to the user's current location. While this is technically feasible with the current technology, it raises concerns on the user's location privacy. To address the delicate tradeoff between user's location privacy and appropriateness of the information for that location, this paper discusses three information delivery protocols. One is the widely adopted Android's protocol, the other two are the authors' novel ones, termed AL protocol and LBPP protocol respectively. The former conceals the user's location within a geographical area, the latter employs secure two-party computation. Privacy of all protocols is analysed, motivating the choice to implement the LBPP protocol. It is made available as the "Getmewhere" service for the reader to download.

移动用户已经习惯了在移动中获取有用的信息。这种习惯的一个含义是，某些实时信息，如导航、约会和处理紧急情况的信息，应该根据用户当前的位置进行定制。虽然以目前的技术，这在技术上是可行的，但它引起了对用户位置隐私的担忧。为了解决用户位置隐私和位置信息适当性之间的微妙权衡，本文讨论了三种信息传递协议。一个是被广泛采用的Android协议，另外两个是作者的新协议，分别称为AL协议和LBPP协议。前者将用户的位置隐藏在一个地理区域内，后者采用安全的双方计算。对所有协议的隐私性进行了分析，促使选择实现LBPP协议。它作为“Getmewhere”服务提供给读者下载。

引用次数: 0

GPU Enabled Serverless Computing Framework 支持GPU的无服务器计算框架

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00090

T. Jun, Daeyoun Kang, Dohyeun Kim, Daeyoung Kim

A new form of cloud computing, serverless computing, is drawing attention as a new way to design micro-services architectures. In a serverless computing environment, services are developed as service functional units. The function development environment of all serverless computing framework at present is CPU based. In this paper, we propose a GPU-supported serverless computing framework that can deploy services faster than existing serverless computing framework using CPU. Our core approach is to integrate the open source serverless computing framework with NVIDIA-Docker and deploy services based on the GPU support container. We have developed an API that connects the open source framework to the NVIDIA-Docker and commands that enable GPU programming. In our experiments, we measured the performance of the framework in various environments. As a result, developers who want to develop services through the framework can deploy high-performance micro services and developers who want to run deep learning programs without a GPU environment can run code on remote GPUs with little performance degradation.

一种新的云计算形式，无服务器计算，作为一种设计微服务架构的新方式，正引起人们的注意。在无服务器计算环境中，服务是作为服务功能单元开发的。目前所有无服务器计算框架的功能开发环境都是基于CPU的。在本文中，我们提出了一个gpu支持的无服务器计算框架，它可以比现有的使用CPU的无服务器计算框架更快地部署服务。我们的核心方法是将开源无服务器计算框架与NVIDIA-Docker集成，并基于GPU支持容器部署服务。我们已经开发了一个API，将开源框架连接到NVIDIA-Docker和支持GPU编程的命令。在我们的实验中，我们测量了框架在各种环境中的性能。因此，希望通过框架开发服务的开发人员可以部署高性能的微服务，而希望在没有GPU环境的情况下运行深度学习程序的开发人员可以在远程GPU上运行代码，而性能几乎没有下降。

{"title":"GPU Enabled Serverless Computing Framework","authors":"T. Jun, Daeyoun Kang, Dohyeun Kim, Daeyoung Kim","doi":"10.1109/PDP2018.2018.00090","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00090","url":null,"abstract":"A new form of cloud computing, serverless computing, is drawing attention as a new way to design micro-services architectures. In a serverless computing environment, services are developed as service functional units. The function development environment of all serverless computing framework at present is CPU based. In this paper, we propose a GPU-supported serverless computing framework that can deploy services faster than existing serverless computing framework using CPU. Our core approach is to integrate the open source serverless computing framework with NVIDIA-Docker and deploy services based on the GPU support container. We have developed an API that connects the open source framework to the NVIDIA-Docker and commands that enable GPU programming. In our experiments, we measured the performance of the framework in various environments. As a result, developers who want to develop services through the framework can deploy high-performance micro services and developers who want to run deep learning programs without a GPU environment can run code on remote GPUs with little performance degradation.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126505253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Business Model of a Botnet 僵尸网络的商业模式

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00077

C. Putman, Abhishta Abhishta, L. Nieuwenhuis

Botnets continue to be an active threat against firms or companies and individuals worldwide. Previous research regarding botnets has unveiled information on how the system and their stakeholders operate, but an insight on the economic structure that supports these stakeholders is lacking. The objective of this research is to analyse the business model and determine the revenue stream of a botnet owner. We also study the botnet life-cycle and determine the costs associated with it on the basis of four case studies. We conclude that building a full scale cyber army from scratch is very expensive where as acquiring a previously developed botnet requires a little cost. We find that initial setup and monthly costs were minimal compared to total revenue.

僵尸网络继续对全球的公司、公司和个人构成威胁。先前关于僵尸网络的研究已经揭示了系统及其利益相关者如何运作的信息，但缺乏对支持这些利益相关者的经济结构的洞察。本研究的目的是分析商业模式，并确定僵尸网络所有者的收入流。我们还研究了僵尸网络的生命周期，并根据四个案例研究确定了与之相关的成本。我们的结论是，从零开始建立一支全面的网络军队是非常昂贵的，而获得一个以前开发的僵尸网络只需要一点成本。我们发现，与总收入相比，初始设置和每月成本是最小的。

引用次数: 31

Extending PluTo for Multiple Devices by Integrating OpenACC 通过集成OpenACC扩展PluTo的多设备

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00049

Tim Süß, Tunahan Kaya, Dustin Feld

For many years now, processor vendors increased the performance of their devices by adding more cores and wider vectorization units to their CPUs instead of scaling up the processors' clock frequency. Moreover, GPUs became popular for solving problems with even more parallel compute power. To exploit the full potential of modern compute devices, specific codes are necessary which are often coded in a hardware-specific manner. Usually, the codes for CPUs are not usable for GPUs and vice versa. The programming API OpenACC tries to close this gap by enabling one code-base to be suitable and optimized for many devices. Nevertheless, OpenACC is rarely used by `standard programmers' and while different code transformers (like PluTo) allow for (semi-)automatic code parallelization for multi-core CPUs, they do generally not support OpenACC yet. We present first promising results of our PluTo extension that generates parallelized codes using OpenACC. Using our transformer we create programs which exploit the parallelism of different platforms without any manual modifications and we achieve performance speedups of up to 100 in comparison to the original unoptimized programs and accelations of 2.05 in comparison to equally generated OpenMP codes.

多年来，处理器供应商通过增加更多的内核和更宽的向量化单元来提高其设备的性能，而不是增加处理器的时钟频率。此外，gpu在解决具有更多并行计算能力的问题方面变得流行起来。为了充分利用现代计算设备的潜力，需要特定的代码，这些代码通常以特定于硬件的方式编码。通常，cpu的代码不能用于gpu，反之亦然。编程API OpenACC试图通过使一个代码库适合并优化许多设备来缩小这一差距。尽管如此，“标准程序员”很少使用OpenACC，尽管不同的代码转换器(如PluTo)允许多核cpu的(半)自动代码并行化，但它们通常还不支持OpenACC。我们展示了PluTo扩展的第一个有希望的结果，该扩展使用OpenACC生成并行代码。使用我们的转换器，我们创建的程序可以利用不同平台的并行性，而无需任何手动修改，与原始未优化的程序相比，我们实现了高达100的性能加速，与同等生成的OpenMP代码相比，我们实现了2.05的加速。

{"title":"Extending PluTo for Multiple Devices by Integrating OpenACC","authors":"Tim Süß, Tunahan Kaya, Dustin Feld","doi":"10.1109/PDP2018.2018.00049","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00049","url":null,"abstract":"For many years now, processor vendors increased the performance of their devices by adding more cores and wider vectorization units to their CPUs instead of scaling up the processors' clock frequency. Moreover, GPUs became popular for solving problems with even more parallel compute power. To exploit the full potential of modern compute devices, specific codes are necessary which are often coded in a hardware-specific manner. Usually, the codes for CPUs are not usable for GPUs and vice versa. The programming API OpenACC tries to close this gap by enabling one code-base to be suitable and optimized for many devices. Nevertheless, OpenACC is rarely used by `standard programmers' and while different code transformers (like PluTo) allow for (semi-)automatic code parallelization for multi-core CPUs, they do generally not support OpenACC yet. We present first promising results of our PluTo extension that generates parallelized codes using OpenACC. Using our transformer we create programs which exploit the parallelism of different platforms without any manual modifications and we achieve performance speedups of up to 100 in comparison to the original unoptimized programs and accelations of 2.05 in comparison to equally generated OpenMP codes.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114279039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerating Blockchain Search of Full Nodes Using GPUs 使用gpu加速区块链全节点搜索

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00041

Shin Morishima, Hiroki Matsutani

Blockchain is a distributed ledger system based on P2P network and originally used for a crypto currency system. The P2P network of Blockchain is maintained by full nodes which are in charge of verifying all the transactions in the network. However, most Blockchain user nodes do not act as full nodes, because workload of full nodes is quite high for personal mobile devices. Blockchain search queries, such as confirming balance, transaction contents, and transaction histories, from many users go to the full nodes. As a result, search throughput of full nodes would be a new bottleneck of Blockchain system, because the number of full nodes is less than the number of users of Blockchain systems. In this paper, we propose an acceleration method of Blockchain search using GPUs. More specifically, we introduce an array-based Patricia tree structure suitable for GPU processing so that we can make effective use of Blockchain feature that there are no update and delete queries. In the evaluations, the proposed method is compared with an existing GPU-based key-value search and a conventional CPU-based search in terms of the throughput of Blockchain key search. As a result, the throughput of our proposal is 3.4 times higher than that of the existing GPU-based search and 14.1 times higher than that of the CPU search when the number of keys is 80 ×2^20 and the key length is 256-bit in Blockchain search queries.

区块链是一个基于P2P网络的分布式账本系统，最初用于加密货币系统。区块链的P2P网络由全节点维护，全节点负责对网络中的所有交易进行验证。但是，大多数区块链用户节点不充当完整节点，因为对于个人移动设备来说，完整节点的工作负载相当高。区块链搜索查询，如确认余额、事务内容和事务历史记录，将从许多用户转到完整节点。因此，全节点的搜索吞吐量将成为区块链系统的新瓶颈，因为全节点的数量少于区块链系统的用户数量。本文提出了一种利用gpu加速区块链搜索的方法。更具体地说，我们引入了一个适合GPU处理的基于数组的Patricia树结构，这样我们就可以有效地利用区块链特性，没有更新和删除查询。在评估中，就区块链键搜索的吞吐量与现有的基于gpu的键值搜索和传统的基于cpu的搜索进行了比较。因此，在区块链搜索查询中，当密钥数为80 ×2^20，密钥长度为256位时，我们提出的吞吐量是现有基于gpu的搜索的3.4倍，是CPU搜索的14.1倍。

{"title":"Accelerating Blockchain Search of Full Nodes Using GPUs","authors":"Shin Morishima, Hiroki Matsutani","doi":"10.1109/PDP2018.2018.00041","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00041","url":null,"abstract":"Blockchain is a distributed ledger system based on P2P network and originally used for a crypto currency system. The P2P network of Blockchain is maintained by full nodes which are in charge of verifying all the transactions in the network. However, most Blockchain user nodes do not act as full nodes, because workload of full nodes is quite high for personal mobile devices. Blockchain search queries, such as confirming balance, transaction contents, and transaction histories, from many users go to the full nodes. As a result, search throughput of full nodes would be a new bottleneck of Blockchain system, because the number of full nodes is less than the number of users of Blockchain systems. In this paper, we propose an acceleration method of Blockchain search using GPUs. More specifically, we introduce an array-based Patricia tree structure suitable for GPU processing so that we can make effective use of Blockchain feature that there are no update and delete queries. In the evaluations, the proposed method is compared with an existing GPU-based key-value search and a conventional CPU-based search in terms of the throughput of Blockchain key search. As a result, the throughput of our proposal is 3.4 times higher than that of the existing GPU-based search and 14.1 times higher than that of the CPU search when the number of keys is 80 ×2^20 and the key length is 256-bit in Blockchain search queries.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114488167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Parallelizable Strategy for the Estimation of the 3D Structure of Biological Macromolecules 生物大分子三维结构估计的并行化策略

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00026

C. Caudai, M. Zoppè, E. Salerno, I. Merelli, A. Tonazzini

We present a parallelizzable, multilevel algorithm for the study of three-dimensional structure of biological macromolecules, applied to two fundamental topics: the 3D reconstruction of Chromatin and the elaboration of motion of proteins. For Chromatin, starting from contact data obtained through Chromosome Conformation Capture techniques, our method first subdivides the data matrix in biologically relevant blocks, and then treats them separately, at several levels, depending on the initial data resolution. The result is a family of configurations for the entire fiber, each one compatible with both experimental data and prior knowledge about specific genomes. For Proteins, the method is conceived as a solution for the problem of identifying motion and alternative conformations to the deposited structures. The algorithm, using quaternions, processes the main chain and the aminoacid side chian independently; it then exploits a Monte Carlo method for selection of biologically acceptable conformations, based on energy evaluation, and finally returns a family of conformations and of trajectories at single atom resolution.

我们提出了一种可并行化的多层算法，用于研究生物大分子的三维结构，应用于两个基本主题:染色质的三维重建和蛋白质运动的阐述。对于染色质，我们的方法首先从通过染色体构象捕获技术获得的接触数据开始，将数据矩阵细分为生物学相关块，然后根据初始数据分辨率在多个级别上分别处理它们。结果是整个纤维的一系列配置，每个配置都与实验数据和特定基因组的先验知识兼容。对于蛋白质，该方法被认为是识别运动和沉积结构的替代构象问题的解决方案。该算法使用四元数对主链和氨基酸侧链进行独立处理;然后，它利用蒙特卡罗方法选择生物上可接受的构象，基于能量评估，并最终返回一个家族的构象和单原子分辨率的轨迹。

{"title":"Parallelizable Strategy for the Estimation of the 3D Structure of Biological Macromolecules","authors":"C. Caudai, M. Zoppè, E. Salerno, I. Merelli, A. Tonazzini","doi":"10.1109/PDP2018.2018.00026","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00026","url":null,"abstract":"We present a parallelizzable, multilevel algorithm for the study of three-dimensional structure of biological macromolecules, applied to two fundamental topics: the 3D reconstruction of Chromatin and the elaboration of motion of proteins. For Chromatin, starting from contact data obtained through Chromosome Conformation Capture techniques, our method first subdivides the data matrix in biologically relevant blocks, and then treats them separately, at several levels, depending on the initial data resolution. The result is a family of configurations for the entire fiber, each one compatible with both experimental data and prior knowledge about specific genomes. For Proteins, the method is conceived as a solution for the problem of identifying motion and alternative conformations to the deposited structures. The algorithm, using quaternions, processes the main chain and the aminoacid side chian independently; it then exploits a Monte Carlo method for selection of biologically acceptable conformations, based on energy evaluation, and finally returns a family of conformations and of trajectories at single atom resolution.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134319254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid OpenMP-MPI Parallelism: Porting Experiments from Small to Large Clusters 混合OpenMP-MPI并行:从小型集群到大型集群的移植实验

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

Pub Date : 2018-03-21 DOI: 10.1109/PDP2018.2018.00051

M. Ferretti, L. Santangelo

After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementations, this paper compares the scalability, efficiency and speedup of the hybrid implementation on a small cluster and on a real HPC system, explaining which factors make the application more efficient when it runs on the real HPC architecture. Using profiling and tracing tools highlighted that the hybrid implementation cannot exploit the OpenMP parallelism because of different factors (heap contention among the threads, spin time and overhead time introduced by OpenMP and thread-safe external functions), making the pure MPI implementation better than any other hybrid one. By characterizing of the workload, we also discovered that the application gets improved by changing the order with which tasks are processed. This observation leads to the introduction of a new selection policy, named Longest Job First. The new policy represents a winning solution for tasks submission among all running MPI processes.

本文简要介绍了Cross Motif Search及其OpenMP和混合OpenMP- mpi实现，比较了混合OpenMP- mpi在小型集群和实际HPC系统上的可扩展性、效率和加速，解释了哪些因素使应用程序在实际HPC架构上运行时更高效。使用分析和跟踪工具强调，由于不同的因素(线程之间的堆争用、自旋转时间和OpenMP引入的开销时间以及线程安全的外部函数)，混合实现无法利用OpenMP的并行性，这使得纯MPI实现优于任何其他混合实现。通过描述工作负载的特征，我们还发现，通过改变处理任务的顺序，应用程序得到了改进。这一观察结果导致引入了一种新的选择策略，称为最长作业优先。新策略代表了在所有运行的MPI进程之间提交任务的成功解决方案。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀