首页 > 最新文献

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)最新文献

英文 中文
High Performance Computation for the Multi-Parameterized Edit Distance 多参数编辑距离的高性能计算
Francesco Cauteruccio, Davide Consalvo, G. Terracina
In this paper, we propose a method for the computation of a novel distance metrics, called Multi-Parameterized Edit Distance (MPED) among strings defined over heterogeneous alphabets. We show that the computation of MPED is hard and that several interesting application contexts can benefit from its application. We then present a novel imple- mentation strategy based on an Evolutionary Heuristics, which we experimentally demonstrate to be efficient and effective for the problem at hand. Our approach paves indeed the way to the adoption of this new metric in all those contexts in which involved strings come from heterogeneous sources, each adopting its own alphabet.
在本文中,我们提出了一种新的距离度量的计算方法,称为多参数化编辑距离(MPED)在异构字母上定义的字符串之间。我们证明了MPED的计算是困难的,并且一些有趣的应用环境可以从它的应用中受益。然后,我们提出了一种基于进化启发式的新实现策略,我们通过实验证明该策略对手头的问题是高效和有效的。我们的方法确实为在所有上下文中采用这种新度量铺平了道路,其中涉及的字符串来自异构来源,每个都采用自己的字母表。
{"title":"High Performance Computation for the Multi-Parameterized Edit Distance","authors":"Francesco Cauteruccio, Davide Consalvo, G. Terracina","doi":"10.1109/PDP2018.2018.00096","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00096","url":null,"abstract":"In this paper, we propose a method for the computation of a novel distance metrics, called Multi-Parameterized Edit Distance (MPED) among strings defined over heterogeneous alphabets. We show that the computation of MPED is hard and that several interesting application contexts can benefit from its application. We then present a novel imple- mentation strategy based on an Evolutionary Heuristics, which we experimentally demonstrate to be efficient and effective for the problem at hand. Our approach paves indeed the way to the adoption of this new metric in all those contexts in which involved strings come from heterogeneous sources, each adopting its own alphabet.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129817200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Portable Multidimensional Coarray for C++ 用于c++的可移植多维队列
Felix MoBbauer, R. Kowalewski, Tobias Fuchs, K. Fürlinger
Fortran Coarrays are a well known data structure in High Performance Computing (HPC) applications. There have been various attempts to port the concept to other programming languages that have a wider user base outside of scientific computing. While a popular implementation of the partitioned global address space (PGAS) model is Unified Parallel C (UPC), there is currently no portable implementation of Coarrays for C++. In this paper a portable version is presented, which is closely based on the Coarray C++ implementation of the Cray Compiling Environment. In this work we focus on a common subset of all proposed features by Cray. Our implementation utilizes the distributed data structures provided by the DASH library, demonstrating their universal applicability. Finally, a performance evaluation shows that our proposed Coarray abstraction adds negligible overhead and even outperforms native Coarray Fortran.
Fortran数组是高性能计算(HPC)应用中众所周知的数据结构。已经有各种尝试将这个概念移植到其他编程语言中,这些语言在科学计算之外拥有更广泛的用户基础。虽然分区全局地址空间(PGAS)模型的一个流行实现是统一并行C (UPC),但目前还没有面向c++的可移植的数组实现。本文提出了一个基于Coarray c++实现的Cray编译环境的可移植版本。在这项工作中,我们专注于Cray提出的所有特征的一个公共子集。我们的实现利用了DASH库提供的分布式数据结构,证明了它们的普遍适用性。最后,性能评估表明,我们提出的Coarray抽象增加的开销可以忽略不计,甚至优于原生Coarray Fortran。
{"title":"A Portable Multidimensional Coarray for C++","authors":"Felix MoBbauer, R. Kowalewski, Tobias Fuchs, K. Fürlinger","doi":"10.1109/PDP2018.2018.00012","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00012","url":null,"abstract":"Fortran Coarrays are a well known data structure in High Performance Computing (HPC) applications. There have been various attempts to port the concept to other programming languages that have a wider user base outside of scientific computing. While a popular implementation of the partitioned global address space (PGAS) model is Unified Parallel C (UPC), there is currently no portable implementation of Coarrays for C++. In this paper a portable version is presented, which is closely based on the Coarray C++ implementation of the Cray Compiling Environment. In this work we focus on a common subset of all proposed features by Cray. Our implementation utilizes the distributed data structures provided by the DASH library, demonstrating their universal applicability. Finally, a performance evaluation shows that our proposed Coarray abstraction adds negligible overhead and even outperforms native Coarray Fortran.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126104448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cellular Automata Modelling of the Movement of People with Disabilities during Building Evacuation 建筑物疏散过程中残疾人运动的元胞自动机建模
P. Kontou, I. Georgoudas, G. Trunfio, G. Sirakoulis
This study deals with the evacuation of areas that involve people with disabilities. A crowd evacuation model has been developed using the Cellular Automata (CA) parallel computing tool. This model is capable of simulating and evaluating human behavior and special features that exist when people with disabilities are included in the process of evacuation. During the experimental process, the model simulates the evacuation of a secondary school for disabled children in the prefecture of Xanthi. After attendance and observation of an earthquake safety exercise organized by this school, the total evacuation time is recorded. At the end of this study, the developed model is validated on the basis of actual data and useful conclusions are drawn for the specific application area. In addition, with the modification of the original data, the model is applicable to every building case.
本研究涉及涉及残疾人的地区的疏散。利用元胞自动机(CA)并行计算工具建立了人群疏散模型。该模型能够模拟和评估将残疾人纳入疏散过程中存在的人类行为和特征。在实验过程中,模型模拟了仙提县一所残疾儿童中学的疏散过程。参加并观摩了学校组织的一次地震安全演习,记录总疏散时间。在研究的最后,根据实际数据对所建立的模型进行了验证,并得出了针对具体应用领域的有益结论。此外,通过对原始数据的修改,该模型适用于各种建筑案例。
{"title":"Cellular Automata Modelling of the Movement of People with Disabilities during Building Evacuation","authors":"P. Kontou, I. Georgoudas, G. Trunfio, G. Sirakoulis","doi":"10.1109/PDP2018.2018.00093","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00093","url":null,"abstract":"This study deals with the evacuation of areas that involve people with disabilities. A crowd evacuation model has been developed using the Cellular Automata (CA) parallel computing tool. This model is capable of simulating and evaluating human behavior and special features that exist when people with disabilities are included in the process of evacuation. During the experimental process, the model simulates the evacuation of a secondary school for disabled children in the prefecture of Xanthi. After attendance and observation of an earthquake safety exercise organized by this school, the total evacuation time is recorded. At the end of this study, the developed model is validated on the basis of actual data and useful conclusions are drawn for the specific application area. In addition, with the modification of the original data, the model is applicable to every building case.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128466263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Task Parallelism in the WRF Model Through Computation Offloading to Many-Core Devices WRF模型中多核设备计算卸载的任务并行性
Rodrigo Baya, C. Porrini, M. Pedemonte, P. Ezzatti
In the last decade the use of hybrid hardware (e.g., multicore processors + coprocessors) has been growing on the HPC field. However, this evolution in the HPC hardware has not been fully exploited by the WRF model since it shows limitations in the scalability when a large number of computing units are used. In a previous work, we proposed an asynchronous architecture for the WRF that overlaps the radiation computation with the execution of the rest of the model. In this work, we extend this idea with the aim of exploiting the computational power offered by hybrid hardware platforms. Specifically, we implement an OpenMP version of the asynchronous architecture and include the use of two types of coprocessors, a Xeon Phi and a GPU. The experimental evaluation performed shows that our proposal is able to adequately exploit these secondary computation devices, reaching interesting runtime reductions when solving tests cases from real scenarios.
在过去十年中,混合硬件(例如,多核处理器+协处理器)在高性能计算领域的使用一直在增长。然而,WRF模型并没有充分利用HPC硬件的这种演变,因为当使用大量计算单元时,它显示出可伸缩性的局限性。在之前的工作中,我们为WRF提出了一种异步架构,该架构将辐射计算与模型其余部分的执行重叠。在这项工作中,我们扩展了这个想法,目的是利用混合硬件平台提供的计算能力。具体来说,我们实现了一个OpenMP版本的异步架构,包括使用两种类型的协处理器,Xeon Phi和GPU。实验评估表明,我们的建议能够充分利用这些辅助计算设备,在从真实场景中解决测试用例时达到有趣的运行时间减少。
{"title":"Task Parallelism in the WRF Model Through Computation Offloading to Many-Core Devices","authors":"Rodrigo Baya, C. Porrini, M. Pedemonte, P. Ezzatti","doi":"10.1109/PDP2018.2018.00100","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00100","url":null,"abstract":"In the last decade the use of hybrid hardware (e.g., multicore processors + coprocessors) has been growing on the HPC field. However, this evolution in the HPC hardware has not been fully exploited by the WRF model since it shows limitations in the scalability when a large number of computing units are used. In a previous work, we proposed an asynchronous architecture for the WRF that overlaps the radiation computation with the execution of the rest of the model. In this work, we extend this idea with the aim of exploiting the computational power offered by hybrid hardware platforms. Specifically, we implement an OpenMP version of the asynchronous architecture and include the use of two types of coprocessors, a Xeon Phi and a GPU. The experimental evaluation performed shows that our proposal is able to adequately exploit these secondary computation devices, reaching interesting runtime reductions when solving tests cases from real scenarios.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130841685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ECHO-3DHPC: Relativistic Accretion Disks onto Black Holes ECHO-3DHPC:黑洞的相对论吸积盘
M. Bugli
Current state-of-the-art simulations of accretion flows onto black holes require a significant level of numerical sophistication, in order to allow the three-dimensional modeling of relativistic magnetized plasma in a regime of strong gravity. We present here a new version of the GRMHD code ECHO developed in collaboration with the Max Planck Computing and Data Facility (MPCDF) and the Leibniz Rechenzentrum (LRZ), which employs a hybrid multidimensional MPI-OpenMP coupled with the production of MPI-HDF5 output files. The code's high degree of parallelization has been crucial for the study of some fundamental properties of thick accretion disks around black holes, in particular the excitation of non-axisymmetric modes in presence of both hydrodynamic and magnetohydrodynamic instabilities.
目前最先进的黑洞吸积流模拟需要相当复杂的数值水平,以便在强引力状态下对相对论磁化等离子体进行三维建模。我们在这里展示了与马克斯普朗克计算和数据设施(MPCDF)和莱布尼茨Rechenzentrum (LRZ)合作开发的GRMHD代码ECHO的新版本,它采用混合多维MPI-OpenMP以及MPI-HDF5输出文件的生产。该代码的高度并行化对于研究黑洞周围厚吸积盘的一些基本性质,特别是在存在流体动力和磁流体动力不稳定性的情况下的非轴对称模式的激发至关重要。
{"title":"ECHO-3DHPC: Relativistic Accretion Disks onto Black Holes","authors":"M. Bugli","doi":"10.1109/PDP2018.2018.00112","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00112","url":null,"abstract":"Current state-of-the-art simulations of accretion flows onto black holes require a significant level of numerical sophistication, in order to allow the three-dimensional modeling of relativistic magnetized plasma in a regime of strong gravity. We present here a new version of the GRMHD code ECHO developed in collaboration with the Max Planck Computing and Data Facility (MPCDF) and the Leibniz Rechenzentrum (LRZ), which employs a hybrid multidimensional MPI-OpenMP coupled with the production of MPI-HDF5 output files. The code's high degree of parallelization has been crucial for the study of some fundamental properties of thick accretion disks around black holes, in particular the excitation of non-axisymmetric modes in presence of both hydrodynamic and magnetohydrodynamic instabilities.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131157079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
SharP Data Constructs: Data Constructs to Enable Data-Centric Computing 夏普数据结构:以数据为中心的计算的数据结构
Ferrol Aderholdt, Manjunath Gorentla Venkata, Zachary W. Parchman
Extreme-scale applications (i.e., Big-Compute) are becoming increasingly data-intensive, i.e., producing and consuming increasingly large amounts of data. The HPC systems traditionally used for these applications are now used for Big-Data applications such as data analytics, social network analysis, machine learning, and genomics. As a consequence of these trends, the system architecture should be flexible and data-centric. This can already be witnessed in the pre-exascale systems with TBs of on-node hierarchical and heterogeneous memories, PBs of system memory, low-latency, high-throughput networks, and many threaded cores. As such, the pre-exascale systems suit the needs of both Big-Compute and Big-Data applications. Though the system architecture is flexible enough to support both Big-Compute and Big-Data, we argue there is a software gap. Particularly, we need data-centric abstractions to leverage the full potential of the system, i.e., there is a need for native support for data resilience, the ability to express data locality and affinity, mechanisms to reduce data movement, the ability to share data, and abstractions to express User's data usage and data access patterns. In this paper, we (i) show the need for taking a holistic approach towards data-centric abstractions, (ii) show how these approaches were realized in the SHARed data-structure centric Programming abstraction (SharP) library, a data-structure centric programming abstraction, and (iii) apply these approaches to a variety of applications that demonstrate its usefulness. Particularly, we apply these approaches to QMCPack and the Graph500 benchmark and demonstrate the advantages of this approach on extreme-scale systems.
超大规模的应用程序(例如,Big-Compute)正变得越来越数据密集,即产生和消耗越来越多的数据。传统上用于这些应用的HPC系统现在用于大数据应用,如数据分析、社交网络分析、机器学习和基因组学。作为这些趋势的结果,系统架构应该是灵活的和以数据为中心的。这已经可以在pre-exascale系统中看到,这些系统具有tb级的节点上分层和异构内存、pb级的系统内存、低延迟、高吞吐量网络和许多线程内核。因此,pre-exascale系统适合大计算和大数据应用的需求。虽然系统架构足够灵活,可以同时支持大计算和大数据,但我们认为存在软件缺口。特别是,我们需要以数据为中心的抽象来充分利用系统的潜力,也就是说,需要对数据弹性的本地支持,表达数据局域性和亲缘性的能力,减少数据移动的机制,共享数据的能力,以及表达用户数据使用和数据访问模式的抽象。在本文中,我们(i)展示了对以数据为中心的抽象采取整体方法的必要性,(ii)展示了这些方法是如何在共享数据结构为中心的编程抽象(SharP)库中实现的,这是一个以数据结构为中心的编程抽象,(iii)将这些方法应用于各种应用程序中,证明了它的实用性。特别地,我们将这些方法应用于QMCPack和Graph500基准测试,并展示了这种方法在极端规模系统上的优势。
{"title":"SharP Data Constructs: Data Constructs to Enable Data-Centric Computing","authors":"Ferrol Aderholdt, Manjunath Gorentla Venkata, Zachary W. Parchman","doi":"10.1109/PDP2018.2018.00031","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00031","url":null,"abstract":"Extreme-scale applications (i.e., Big-Compute) are becoming increasingly data-intensive, i.e., producing and consuming increasingly large amounts of data. The HPC systems traditionally used for these applications are now used for Big-Data applications such as data analytics, social network analysis, machine learning, and genomics. As a consequence of these trends, the system architecture should be flexible and data-centric. This can already be witnessed in the pre-exascale systems with TBs of on-node hierarchical and heterogeneous memories, PBs of system memory, low-latency, high-throughput networks, and many threaded cores. As such, the pre-exascale systems suit the needs of both Big-Compute and Big-Data applications. Though the system architecture is flexible enough to support both Big-Compute and Big-Data, we argue there is a software gap. Particularly, we need data-centric abstractions to leverage the full potential of the system, i.e., there is a need for native support for data resilience, the ability to express data locality and affinity, mechanisms to reduce data movement, the ability to share data, and abstractions to express User's data usage and data access patterns. In this paper, we (i) show the need for taking a holistic approach towards data-centric abstractions, (ii) show how these approaches were realized in the SHARed data-structure centric Programming abstraction (SharP) library, a data-structure centric programming abstraction, and (iii) apply these approaches to a variety of applications that demonstrate its usefulness. Particularly, we apply these approaches to QMCPack and the Graph500 benchmark and demonstrate the advantages of this approach on extreme-scale systems.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132247232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MinVisited: A Message Routing Protocol for Delay Tolerant Network 一种用于容忍延迟网络的消息路由协议
Luis Veas-Castillo, Gabriel Ovando-Leon, V. Gil-Costa, Mauricio Marín
We studied routing protocols for Delay Tolerant Networks devised to improve the message delivery performance in natural disaster scenarios. In this paper we propose the MinVisited protocol which during the transitive path to the message destination, selects the next node based on two features: (1) the most distant neighbor, and (2) the largest number of encounters with the destination node of the message. We compare our protocol with well-known protocols of the technical literature. The results show that the proposed protocol presents a low workload overhead with a number of hops lower than 2, and in average 95% of the messages are successfully delivered.
我们研究了延迟容忍网络的路由协议,旨在提高自然灾害场景下的消息传递性能。本文提出了MinVisited协议,该协议在到达消息目的地的传递路径中,根据两个特征选择下一个节点:(1)距离最远的邻居,(2)与消息的目的地节点相遇次数最多。我们将我们的协议与技术文献中的知名协议进行了比较。结果表明,该协议具有较低的工作负载开销,跳数小于2,平均95%的消息成功传递。
{"title":"MinVisited: A Message Routing Protocol for Delay Tolerant Network","authors":"Luis Veas-Castillo, Gabriel Ovando-Leon, V. Gil-Costa, Mauricio Marín","doi":"10.1109/PDP2018.2018.00057","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00057","url":null,"abstract":"We studied routing protocols for Delay Tolerant Networks devised to improve the message delivery performance in natural disaster scenarios. In this paper we propose the MinVisited protocol which during the transitive path to the message destination, selects the next node based on two features: (1) the most distant neighbor, and (2) the largest number of encounters with the destination node of the message. We compare our protocol with well-known protocols of the technical literature. The results show that the proposed protocol presents a low workload overhead with a number of hops lower than 2, and in average 95% of the messages are successfully delivered.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133231982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy-Efficient Actor Execution for SDF Application on Heterogeneous Architectures 异构架构下SDF应用的节能Actor执行
Hergys Rexha, S. Lafond, K. Desnos
Heterogeneous systems promise to improve performance and endurance of power constrained systems, by utilizing computing elements of different power and performance characteristics. Such systems provide the possibility to trade number and types of core with Dynamic Voltage and Frequency Scaling (DVFS) levels and core utilization rate to achieve optimal energy efficiency. Therefore by making smart decisions on application scheduling and mapping we can exploit and maximize the benefits of using heterogeneous processors. At the same time, the application level of parallelism can conveniently be exposed by dataflow Models of Computation (MoCs). In this paper we show an energy efficient execution approach for heterogeneous architecture. We demonstrate the approach on a real-life streaming application modelled with Parameterized and Interfaced Synchronous Dataflow (PiSDF). The presented solution shows how to integrate our approach in the workflow of a dataflow application prototyping tool. The obtained results demonstrate that, by using an optimal scheduling and mapping, more than 30% of energy reduction can be achieved on a single actor level.
异构系统通过利用不同功率和性能特征的计算元素,有望提高功率受限系统的性能和耐用性。这样的系统提供了用动态电压和频率缩放(DVFS)水平和核心利用率交换核心数量和类型的可能性,以实现最佳的能源效率。因此,通过对应用程序调度和映射做出明智的决策,我们可以利用并最大化使用异构处理器的好处。同时,通过数据流计算模型(moc)可以方便地暴露应用层的并行性。在本文中,我们展示了一种针对异构架构的节能执行方法。我们在一个使用参数化和接口同步数据流(PiSDF)建模的现实流应用程序上演示了该方法。所提出的解决方案展示了如何将我们的方法集成到数据流应用程序原型工具的工作流中。结果表明,通过优化调度和映射,在单个参与者层面上可以实现30%以上的节能。
{"title":"Energy-Efficient Actor Execution for SDF Application on Heterogeneous Architectures","authors":"Hergys Rexha, S. Lafond, K. Desnos","doi":"10.1109/PDP2018.2018.00083","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00083","url":null,"abstract":"Heterogeneous systems promise to improve performance and endurance of power constrained systems, by utilizing computing elements of different power and performance characteristics. Such systems provide the possibility to trade number and types of core with Dynamic Voltage and Frequency Scaling (DVFS) levels and core utilization rate to achieve optimal energy efficiency. Therefore by making smart decisions on application scheduling and mapping we can exploit and maximize the benefits of using heterogeneous processors. At the same time, the application level of parallelism can conveniently be exposed by dataflow Models of Computation (MoCs). In this paper we show an energy efficient execution approach for heterogeneous architecture. We demonstrate the approach on a real-life streaming application modelled with Parameterized and Interfaced Synchronous Dataflow (PiSDF). The presented solution shows how to integrate our approach in the workflow of a dataflow application prototyping tool. The obtained results demonstrate that, by using an optimal scheduling and mapping, more than 30% of energy reduction can be achieved on a single actor level.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116170642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallelizing and Optimizing LHCb-Kalman for Intel Xeon Phi KNL Processors Intel Xeon Phi KNL处理器LHCb-Kalman并行化与优化
P. Fernández, David del Rio Astorga, M. F. Dolz, Javier Fernández, O. Awile, José Daniel García Sánchez
Real time data processing is an important component of particle physics experiments with large computing resource requirements. As the Large Hadron Collider (LHC) at CERN is preparing for its next upgrade the LHCb experiment is upgrading its detector for a 30x increase in data throughput. In preparation for this upgrade the experiment is considering a number of architectural improvements encompassing both its software and hardware infrastructure. One of the hardware platforms under consideration is the Intel Xeon-Phi Knights Landing processor. Thanks to its on-package high-bandwidth memory and many-core architecture it offers an interesting alternative to more traditional server systems. We present a scalable, multi-threaded and NUMA-aware Kalman filter proto-application for particle track fitting expressed in terms of generic parallel patterns using the GrPPI interface. We show how code maintainability and readability improves, while maintaining comparable levels of performance to the baseline implementation. This is achieved by keeping the parallel algorithms in the underlying framework generic, but topology aware through the use of the Portable Hardware Locality (hwloc) library, which allows us to target different architectures with the same program. We measure the performance of our topology-aware GrPPI Kalman filter implementation on the Intel Xeon-Phi Knights Landing platform and conclude on the feasibility of integrating such high-level parallelization libraries in complex software frameworks such as LHCb's Gaudi framework.
实时数据处理是粒子物理实验的重要组成部分,对计算资源的需求很大。随着欧洲核子研究中心的大型强子对撞机(LHC)准备进行下一次升级,LHCb实验正在升级其探测器,以使数据吞吐量提高30倍。为了准备这次升级,实验正在考虑对软件和硬件基础设施进行一些架构改进。其中一个考虑中的硬件平台是Intel Xeon-Phi Knights Landing处理器。由于其封装内的高带宽内存和多核架构,它为更传统的服务器系统提供了一个有趣的替代方案。我们提出了一个可扩展的、多线程的、numa感知的卡尔曼滤波原型应用程序,用于使用GrPPI接口以通用并行模式表示的粒子轨迹拟合。我们展示了代码的可维护性和可读性是如何提高的,同时保持了与基线实现相当的性能水平。这是通过在底层框架中保持并行算法的通用性来实现的,但通过使用可移植硬件局部性(hwloc)库来实现拓扑感知,该库允许我们使用相同的程序来针对不同的体系结构。我们在Intel Xeon-Phi Knights Landing平台上测量了拓扑感知的GrPPI Kalman滤波器实现的性能,并得出了在复杂软件框架(如LHCb的Gaudi框架)中集成这种高级并行化库的可行性。
{"title":"Parallelizing and Optimizing LHCb-Kalman for Intel Xeon Phi KNL Processors","authors":"P. Fernández, David del Rio Astorga, M. F. Dolz, Javier Fernández, O. Awile, José Daniel García Sánchez","doi":"10.1109/PDP2018.2018.00121","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00121","url":null,"abstract":"Real time data processing is an important component of particle physics experiments with large computing resource requirements. As the Large Hadron Collider (LHC) at CERN is preparing for its next upgrade the LHCb experiment is upgrading its detector for a 30x increase in data throughput. In preparation for this upgrade the experiment is considering a number of architectural improvements encompassing both its software and hardware infrastructure. One of the hardware platforms under consideration is the Intel Xeon-Phi Knights Landing processor. Thanks to its on-package high-bandwidth memory and many-core architecture it offers an interesting alternative to more traditional server systems. We present a scalable, multi-threaded and NUMA-aware Kalman filter proto-application for particle track fitting expressed in terms of generic parallel patterns using the GrPPI interface. We show how code maintainability and readability improves, while maintaining comparable levels of performance to the baseline implementation. This is achieved by keeping the parallel algorithms in the underlying framework generic, but topology aware through the use of the Portable Hardware Locality (hwloc) library, which allows us to target different architectures with the same program. We measure the performance of our topology-aware GrPPI Kalman filter implementation on the Intel Xeon-Phi Knights Landing platform and conclude on the feasibility of integrating such high-level parallelization libraries in complex software frameworks such as LHCb's Gaudi framework.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114830968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High-Resolution Numerical Relativity Simulations of Spinning Binary Neutron Star Mergers 旋转双中子星并合的高分辨率数值相对论模拟
T. Dietrich, S. Bernuzzi, B. Brügmann, W. Tichy
The recent detection of gravitational waves and electromagnetic counterparts emitted during and after the collision of two neutron stars marks a breakthrough in the field of multi-messenger astronomy. Numerical relativity simulations are the only tool to describe the binary's merger dynamics in the regime when speeds are largest and gravity is strongest. In this work we report state-of-the-art binary neutron star simulations for irrotational (non-spinning) and spinning configurations. The main use of these simulations is to model the gravitational-wave signal. Key numerical requirements are the understanding of the convergence properties of the numerical data and a detailed error budget. The simulations have been performed on different HPC clusters, they use multiple grid resolutions, and are based on eccentricity reduced quasi-circular initial data. We obtain convergent waveforms with phase errors of 0.5-1.5rad accumulated over ~12 orbits to merger. The waveforms have been used for the construction of a phenomenological waveform model which has been applied for the analysis of the recent binary neutron star detection. Additionally, we show that the data can also be used to test other state-of-the-art semi-analytical waveform models.
最近对两颗中子星碰撞期间和之后发射的引力波和电磁对应波的探测标志着多信使天文学领域的一个突破。在速度最大、引力最强的情况下,数值相对论模拟是描述双星合并动力学的唯一工具。在这项工作中,我们报告了最先进的双中子星模拟无旋转(非自旋)和自旋构型。这些模拟的主要用途是模拟引力波信号。关键的数值要求是理解数值数据的收敛特性和详细的误差预算。在不同的HPC集群上进行了模拟,它们使用了多种网格分辨率,并基于减少偏心的准圆初始数据。我们得到了相位误差在0.5 ~ 1.5rad的收敛波形,累积了约12个轨道进行合并。这些波形已用于构建现象学波形模型,该模型已用于分析最近的双中子星探测。此外,我们表明数据也可用于测试其他最先进的半分析波形模型。
{"title":"High-Resolution Numerical Relativity Simulations of Spinning Binary Neutron Star Mergers","authors":"T. Dietrich, S. Bernuzzi, B. Brügmann, W. Tichy","doi":"10.1109/PDP2018.2018.00113","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00113","url":null,"abstract":"The recent detection of gravitational waves and electromagnetic counterparts emitted during and after the collision of two neutron stars marks a breakthrough in the field of multi-messenger astronomy. Numerical relativity simulations are the only tool to describe the binary's merger dynamics in the regime when speeds are largest and gravity is strongest. In this work we report state-of-the-art binary neutron star simulations for irrotational (non-spinning) and spinning configurations. The main use of these simulations is to model the gravitational-wave signal. Key numerical requirements are the understanding of the convergence properties of the numerical data and a detailed error budget. The simulations have been performed on different HPC clusters, they use multiple grid resolutions, and are based on eccentricity reduced quasi-circular initial data. We obtain convergent waveforms with phase errors of 0.5-1.5rad accumulated over ~12 orbits to merger. The waveforms have been used for the construction of a phenomenological waveform model which has been applied for the analysis of the recent binary neutron star detection. Additionally, we show that the data can also be used to test other state-of-the-art semi-analytical waveform models.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127904667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1