首页 > 最新文献

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)最新文献

英文 中文
TMbarrier: Speculative Barriers Using Hardware Transactional Memory 使用硬件事务性内存的推测屏障
Manuel Pedrero, E. Gutiérrez, O. Plata
Barrier is a very common synchronization method used in parallel programming. Barriers are used typically to enforce a partial thread execution order, since there may be dependences between code sections before and after the barrier. This work proposes TMbarrier, a new design of a barrier intended to be used in transactional applications. TMbarrier allows threads to continue executing speculatively after the barrier assuming that there are not dependences with safe threads that have not yet reached the barrier. Our design leverages transactional memory (TM) (specifically, the implementation offered by the IBM POWER8 processor) to hold the speculative updates and to detect possible conflicts between speculative and safe threads. Despite the limitations of the best-effort hardware TM implementation present in current processors, experiments show a reduction in wasted time due to synchronization compared to standard barriers.
Barrier是并行编程中常用的一种同步方法。屏障通常用于强制执行部分线程的执行顺序,因为屏障前后的代码段之间可能存在依赖关系。这项工作提出了TMbarrier,一种用于事务性应用程序的新的屏障设计。TMbarrier允许线程在屏障之后继续推测地执行,假设没有依赖于尚未到达屏障的安全线程。我们的设计利用事务性内存(特别是IBM POWER8处理器提供的实现)来保存推测性更新,并检测推测性线程和安全线程之间可能的冲突。尽管当前处理器中存在尽力而为的硬件TM实现的局限性,但实验表明,与标准屏障相比,由于同步而浪费的时间减少了。
{"title":"TMbarrier: Speculative Barriers Using Hardware Transactional Memory","authors":"Manuel Pedrero, E. Gutiérrez, O. Plata","doi":"10.1109/PDP2018.2018.00036","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00036","url":null,"abstract":"Barrier is a very common synchronization method used in parallel programming. Barriers are used typically to enforce a partial thread execution order, since there may be dependences between code sections before and after the barrier. This work proposes TMbarrier, a new design of a barrier intended to be used in transactional applications. TMbarrier allows threads to continue executing speculatively after the barrier assuming that there are not dependences with safe threads that have not yet reached the barrier. Our design leverages transactional memory (TM) (specifically, the implementation offered by the IBM POWER8 processor) to hold the speculative updates and to detect possible conflicts between speculative and safe threads. Despite the limitations of the best-effort hardware TM implementation present in current processors, experiments show a reduction in wasted time due to synchronization compared to standard barriers.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129428847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Generic Learning Multi-agent-System Approach for Spatio-Temporal-, Thermal- and Energy-Aware Scheduling 时空、热和能量感知调度的通用学习多智能体系统方法
Christina Herzog, J. Pierson
This paper proposes an agent based approach to the scheduling of jobs in data centers under thermal constraints. The model encompasses both temporal and spatial aspects of the temperature evolution using a unified model, taking into account the dynamics of heat production and dissipation. Agents coordinate to eventually move jobs to the best suitable place and to adapt dynamically the frequency settings of the nodes to the best combination. Several objectives of the agents are compared under different circumstances by an extensive set of experiments.
提出了一种基于智能体的数据中心热约束作业调度方法。该模式采用统一的模式,同时考虑了热产生和耗散的动力学,涵盖了温度演变的时间和空间方面。智能体相互协调,最终将工作移动到最合适的位置,并动态地调整节点的频率设置以达到最佳组合。通过一系列广泛的实验,在不同的情况下比较了代理的几个目标。
{"title":"A Generic Learning Multi-agent-System Approach for Spatio-Temporal-, Thermal- and Energy-Aware Scheduling","authors":"Christina Herzog, J. Pierson","doi":"10.1109/PDP2018.2018.00010","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00010","url":null,"abstract":"This paper proposes an agent based approach to the scheduling of jobs in data centers under thermal constraints. The model encompasses both temporal and spatial aspects of the temperature evolution using a unified model, taking into account the dynamics of heat production and dissipation. Agents coordinate to eventually move jobs to the best suitable place and to adapt dynamically the frequency settings of the nodes to the best combination. Several objectives of the agents are compared under different circumstances by an extensive set of experiments.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132031753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating the Effect of Multi-Tenancy Patterns in Containerized Cloud-Hosted Content Management System 评估多租户模式在容器化云托管内容管理系统中的效果
A. Adewojo, J. Bass
Multi-tenancy in cloud computing describes the extent to which resources can be shared while guaranteeing isolation among components (tenants) using these resources. There are three multi-tenancy patterns: shared, tenant-isolated and dedicated component patterns. These patterns have not previously been formally specified. In order to create a precise definition and verify each pattern, we formally specify each pattern using the Z language. To validate the interpretation of our formal description, We empirically evaluate each pattern using the data-tier of a cloud hosted distributed content man- agement application, WordPress, deployed in a Docker container. Experimental results show that the dedicated pattern successfully managed larger numbers of tenants with fewer unhandled request errors. The shared and tenant isolated patterns exhibited larger number of unhandled request errors when the number of tenants increased. We present a selection algorithm to choose suitable multi-tenancy pattern for cloud deployment of content management system.
云计算中的多租户描述了可以在多大程度上共享资源,同时保证使用这些资源的组件(租户)之间的隔离。有三种多租户模式:共享、租户隔离和专用组件模式。这些模式以前没有正式指定过。为了创建精确的定义并验证每个模式,我们使用Z语言正式指定每个模式。为了验证我们正式描述的解释,我们使用部署在Docker容器中的云托管分布式内容管理应用程序WordPress的数据层对每个模式进行了经验评估。实验结果表明,专用模式成功地管理了大量的租户,而未处理的请求错误较少。当租户数量增加时,共享和租户隔离模式显示出更多的未处理请求错误。提出了一种适合内容管理系统云部署的多租户模式选择算法。
{"title":"Evaluating the Effect of Multi-Tenancy Patterns in Containerized Cloud-Hosted Content Management System","authors":"A. Adewojo, J. Bass","doi":"10.1109/PDP2018.2018.00047","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00047","url":null,"abstract":"Multi-tenancy in cloud computing describes the extent to which resources can be shared while guaranteeing isolation among components (tenants) using these resources. There are three multi-tenancy patterns: shared, tenant-isolated and dedicated component patterns. These patterns have not previously been formally specified. In order to create a precise definition and verify each pattern, we formally specify each pattern using the Z language. To validate the interpretation of our formal description, We empirically evaluate each pattern using the data-tier of a cloud hosted distributed content man- agement application, WordPress, deployed in a Docker container. Experimental results show that the dedicated pattern successfully managed larger numbers of tenants with fewer unhandled request errors. The shared and tenant isolated patterns exhibited larger number of unhandled request errors when the number of tenants increased. We present a selection algorithm to choose suitable multi-tenancy pattern for cloud deployment of content management system.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130406609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Developing and Using a Geometric Multigrid, Unstructured Grid Mini-Application to Assess Many-Core Architectures 开发和使用几何多网格、非结构化网格小型应用程序来评估多核体系结构
A. Owenson, Steven A. Wright, Richard A. Bunt, S. Jarvis, Y. Ho, Matthew J. Street
Achieving high-performance of large scientific codes is a difficult task. This has led to the development of numerous mini-applications that are more tractable to analyse, while retaining performance characteristics of their full-sized counterparts. These "mini-apps" also enable faster hardware evaluation, and for sensitive codes allow evaluation of systems outside of access approval processes. In this paper we develop a mini-application of a geometric multigrid, unstructured grid Computational Fluid Dynamics (CFD) code, designed to exhibit similar performance characteristics without sharing code. We detail our experiences developing this application, using guidelines detailed in existing research, and contribute further additions to these to aid future mini-application developers. Our application is validated against the inviscid flux routine of HYDRA, a CFD code developed by Rolls-Royce, which confirms that the parent kernel and mini-application share fundamental causes of parallel inefficiency. We then use the mini-application to assess the impact of Intel's Knights Landing (KNL) on performance. We find that the mini-app and parent kernel continue to share scaling characteristics, however a comparison with Broadwell performance exposed significant differences between the kernels that were undetected by the validation.
实现大型科学代码的高性能是一项艰巨的任务。这导致了许多小型应用程序的开发,这些应用程序更易于分析,同时保留了其完整尺寸对应程序的性能特征。这些“迷你应用程序”还可以实现更快的硬件评估,对于敏感代码,可以在访问批准流程之外对系统进行评估。在本文中,我们开发了一个几何多网格、非结构化网格计算流体动力学(CFD)代码的迷你应用程序,旨在在不共享代码的情况下展示相似的性能特征。我们详细介绍了我们开发这个应用程序的经验,使用了现有研究中详细介绍的指导方针,并为这些指导方针提供了进一步的补充,以帮助未来的小型应用程序开发人员。我们的应用程序在Rolls-Royce公司开发的CFD代码HYDRA的无粘流例程中进行了验证,证实了父内核和小应用程序共享并行效率低下的根本原因。然后,我们使用迷你应用程序来评估英特尔骑士登陆(KNL)对性能的影响。我们发现小应用程序和父内核继续共享缩放特性,但是与Broadwell性能的比较暴露了验证未检测到的内核之间的显着差异。
{"title":"Developing and Using a Geometric Multigrid, Unstructured Grid Mini-Application to Assess Many-Core Architectures","authors":"A. Owenson, Steven A. Wright, Richard A. Bunt, S. Jarvis, Y. Ho, Matthew J. Street","doi":"10.1109/PDP2018.2018.00018","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00018","url":null,"abstract":"Achieving high-performance of large scientific codes is a difficult task. This has led to the development of numerous mini-applications that are more tractable to analyse, while retaining performance characteristics of their full-sized counterparts. These \"mini-apps\" also enable faster hardware evaluation, and for sensitive codes allow evaluation of systems outside of access approval processes. In this paper we develop a mini-application of a geometric multigrid, unstructured grid Computational Fluid Dynamics (CFD) code, designed to exhibit similar performance characteristics without sharing code. We detail our experiences developing this application, using guidelines detailed in existing research, and contribute further additions to these to aid future mini-application developers. Our application is validated against the inviscid flux routine of HYDRA, a CFD code developed by Rolls-Royce, which confirms that the parent kernel and mini-application share fundamental causes of parallel inefficiency. We then use the mini-application to assess the impact of Intel's Knights Landing (KNL) on performance. We find that the mini-app and parent kernel continue to share scaling characteristics, however a comparison with Broadwell performance exposed significant differences between the kernels that were undetected by the validation.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125148908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Availability in Distributed Tuple Spaces Via Sharing Abstractions and Replication Strategies 通过共享抽象和复制策略提高分布式元组空间的可用性
V. Buravlev, R. Nicola, Alberto Lluch-Lafuente, C. A. Mezzina
Data availability is a key aspect of modern distributed systems. We discuss an extension of coordination languages based on tuple spaces with programming abstractions for sharing data and guaranteeing availability with different consistency guarantees. Data can be spread over the system according to user-specified replica placement strategies and user-specified consistency requirements. The framework takes care then of low-level management of the replicas, so that the programmer can just focus on the business logic of the application. We advocate that the proposed programming primitives are beneficial for data-oriented applications where different kinds of data may have different needs in terms of availability and consistency.
数据可用性是现代分布式系统的一个关键方面。我们讨论了一种基于元组空间的协调语言的扩展,该扩展具有编程抽象,用于通过不同的一致性保证共享数据和保证可用性。数据可以根据用户指定的副本放置策略和用户指定的一致性要求在系统中分布。框架负责副本的低级管理,因此程序员可以只关注应用程序的业务逻辑。我们主张,所建议的编程原语对于面向数据的应用程序是有益的,在这些应用程序中,不同类型的数据在可用性和一致性方面可能有不同的需求。
{"title":"Improving Availability in Distributed Tuple Spaces Via Sharing Abstractions and Replication Strategies","authors":"V. Buravlev, R. Nicola, Alberto Lluch-Lafuente, C. A. Mezzina","doi":"10.1109/PDP2018.2018.00052","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00052","url":null,"abstract":"Data availability is a key aspect of modern distributed systems. We discuss an extension of coordination languages based on tuple spaces with programming abstractions for sharing data and guaranteeing availability with different consistency guarantees. Data can be spread over the system according to user-specified replica placement strategies and user-specified consistency requirements. The framework takes care then of low-level management of the replicas, so that the programmer can just focus on the business logic of the application. We advocate that the proposed programming primitives are beneficial for data-oriented applications where different kinds of data may have different needs in terms of availability and consistency.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117151901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Parallel Implementation of WAND on GPUs WAND在gpu上的并行实现
Roussian R. A. Gaioso, V. Gil-Costa, H. Guardia, H. Senger
In this paper we propose and evaluate new strategies for the parallel top-k query processing on GPUs. Our strategies are based on the document-at-a-time approach and have been implemented and tested with the WAND ranking algorithm. In our first strategy (named homogeneous), the posting lists are evenly partitioned among thread blocks. Our second algorithm, named heterogeneous, partitions the posting lists according to document identifier intervals, thus partitions may have different sizes. We also propose three threshold sharing policies, named Local, Safe-R and Safe-WR, which emulate the WAND algorithm global pruning technique. We evaluated our proposals using AND/OR queries, and the results show that the homogeneous algorithm allows better speedups through higher occupancy of the SMs, but at the cost of a lower recall. The heterogeneous algorithm produces the exact top-k documents and shows promising speedups. Also, the Shared-R and Shared-WR policies for threshold propagation allowed better performance, provided there is enough amount of work per thread block, which proved true for queries composed of at least a few millions documents.
在本文中,我们提出并评估了gpu上并行top-k查询处理的新策略。我们的策略基于每次一个文档的方法,并已使用WAND排名算法实现和测试。在我们的第一种策略(称为同构策略)中,发布列表在线程块之间均匀分区。我们的第二种算法称为heterogeneous,它根据文档标识符间隔对张贴列表进行分区,因此分区可能有不同的大小。我们还提出了三种阈值共享策略,分别是Local、Safe-R和Safe-WR,它们模拟了WAND算法的全局剪枝技术。我们使用AND/OR查询来评估我们的建议,结果表明,同构算法通过更高的SMs占用率来实现更好的加速,但代价是更低的召回率。异构算法生成精确的top-k文档,并显示出有希望的加速。此外,用于阈值传播的Shared-R和Shared-WR策略允许更好的性能,前提是每个线程块有足够的工作量,这对于由至少几百万个文档组成的查询是正确的。
{"title":"A Parallel Implementation of WAND on GPUs","authors":"Roussian R. A. Gaioso, V. Gil-Costa, H. Guardia, H. Senger","doi":"10.1109/PDP2018.2018.00011","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00011","url":null,"abstract":"In this paper we propose and evaluate new strategies for the parallel top-k query processing on GPUs. Our strategies are based on the document-at-a-time approach and have been implemented and tested with the WAND ranking algorithm. In our first strategy (named homogeneous), the posting lists are evenly partitioned among thread blocks. Our second algorithm, named heterogeneous, partitions the posting lists according to document identifier intervals, thus partitions may have different sizes. We also propose three threshold sharing policies, named Local, Safe-R and Safe-WR, which emulate the WAND algorithm global pruning technique. We evaluated our proposals using AND/OR queries, and the results show that the homogeneous algorithm allows better speedups through higher occupancy of the SMs, but at the cost of a lower recall. The heterogeneous algorithm produces the exact top-k documents and shows promising speedups. Also, the Shared-R and Shared-WR policies for threshold propagation allowed better performance, provided there is enough amount of work per thread block, which proved true for queries composed of at least a few millions documents.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115064726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient NAS Benchmark Kernels with C++ Parallel Programming 使用c++并行编程的高效NAS基准内核
Dalvan Griebler, Junior Loff, G. Mencagli, M. Danelutto, L. G. Fernandes
Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As a consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be tested in NPB. Our goal is to describe a C++ porting of the NPB kernels and to analyze the performance achieved by different parallel implementations written using the Intel TBB, OpenMP and FastFlow frameworks for Multi-Cores. The experiments show an efficient code porting from Fortran to C++ and an efficient parallelization on average.
基准测试是研究新架构和并行编程框架性能的一种方法。完善的基准套件,如NAS Parallel benchmark (NPB),包含的遗留代码仍然缺乏c++语言的可移植性。因此,一组高级且易于使用的c++并行编程框架无法在NPB中进行测试。我们的目标是描述NPB内核的c++移植,并分析使用Intel TBB、OpenMP和FastFlow多核框架编写的不同并行实现所取得的性能。实验表明,从Fortran到c++的有效代码移植和平均有效的并行化。
{"title":"Efficient NAS Benchmark Kernels with C++ Parallel Programming","authors":"Dalvan Griebler, Junior Loff, G. Mencagli, M. Danelutto, L. G. Fernandes","doi":"10.1109/PDP2018.2018.00120","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00120","url":null,"abstract":"Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As a consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be tested in NPB. Our goal is to describe a C++ porting of the NPB kernels and to analyze the performance achieved by different parallel implementations written using the Intel TBB, OpenMP and FastFlow frameworks for Multi-Cores. The experiments show an efficient code porting from Fortran to C++ and an efficient parallelization on average.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122524098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Divisible Load Scheduling of Image Processing Applications on the Heterogeneous Star Network Using a new Genetic Algorithm 基于新遗传算法的异构星型网络图像处理可分负载调度
S. Aali, H. Shahhoseini, N. Bagherzadeh
The divisible load scheduling of image processing applications on the heterogeneous star network is addressed in this paper. In our platform, processors and links have different speeds. Also the computation and communication overheads are considered. A new genetic algorithm for minimizing the processing time of low level image applications using divisible load theory is introduced. A closed form solution for the processing time and the image fractions that should be assigned to each processor are obtained. The optimum number of participating processors and the optimal sequence for load distribution with a new genetic algorithm are derived. The effect of different image and kernel sizes on processing time and speed up are investigated. Finally, to indicate the efficiency of our algorithm, several numerical experiments are presented.
研究了异构星型网络中图像处理应用的可分负载调度问题。在我们的平台中,处理器和链路具有不同的速度。同时还考虑了计算和通信开销。利用可分负载理论,提出了一种新的最小化低级图像处理时间的遗传算法。得到了处理时间和应分配给每个处理器的图像分数的封闭形式解。用一种新的遗传算法推导出了最优参与处理机数量和负载分配的最优顺序。研究了不同图像和核大小对处理时间和速度的影响。最后,通过数值实验验证了算法的有效性。
{"title":"Divisible Load Scheduling of Image Processing Applications on the Heterogeneous Star Network Using a new Genetic Algorithm","authors":"S. Aali, H. Shahhoseini, N. Bagherzadeh","doi":"10.1109/PDP2018.2018.00019","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00019","url":null,"abstract":"The divisible load scheduling of image processing applications on the heterogeneous star network is addressed in this paper. In our platform, processors and links have different speeds. Also the computation and communication overheads are considered. A new genetic algorithm for minimizing the processing time of low level image applications using divisible load theory is introduced. A closed form solution for the processing time and the image fractions that should be assigned to each processor are obtained. The optimum number of participating processors and the optimal sequence for load distribution with a new genetic algorithm are derived. The effect of different image and kernel sizes on processing time and speed up are investigated. Finally, to indicate the efficiency of our algorithm, several numerical experiments are presented.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126558312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Performance Evaluation of the Metadata-Driven MASi Research Data Management Repository Service 元数据驱动的MASi研究数据管理存储库服务的性能评价
Richard Grunzke, Volker Hartmann, T. Jejkal, H. Kollai, C. Dressler, Julia Dolhoff, Julia Stanek, H. Herold, A. Hoffmann, R. Müller-Pfefferkorn, Torsten Schrade, S. Herres‐Pawlis, G. Meinel, W. Nagel
Research data is increasingly important in order to gain insights from scientific data. To optimally foster this, the management of research data is required to be usable, customizable and fast. We enable this by building up the MASi research data management repository service, based on the KIT DM framework. The aim is on utilizing a single repository instance to serve multiple arbitrary community use cases. Due to their diverse data characteristics the performance of the MASi service has to be fitting across the different cases. We evaluate the performance along three initial heterogeneous use cases. Various aspects are investigated; First, the object insertion and query performance of the database along the object fill level. Second and third, the ingest and download performance of digital objects using real-life data sets. Highly favorable performance characteristics are shown.
为了从科学数据中获得见解,研究数据变得越来越重要。为了最佳地促进这一点,研究数据的管理需要是可用的、可定制的和快速的。我们通过构建基于KIT DM框架的MASi研究数据管理存储库服务来实现这一点。其目的是利用单个存储库实例来服务多个任意社区用例。由于不同的数据特征,MASi服务的性能必须适应不同的情况。我们根据三个初始异构用例评估性能。调查了各个方面;首先,沿着对象填充级别的数据库的对象插入和查询性能。第二和第三,使用真实数据集的数字对象的摄取和下载性能。显示出非常有利的性能特征。
{"title":"Performance Evaluation of the Metadata-Driven MASi Research Data Management Repository Service","authors":"Richard Grunzke, Volker Hartmann, T. Jejkal, H. Kollai, C. Dressler, Julia Dolhoff, Julia Stanek, H. Herold, A. Hoffmann, R. Müller-Pfefferkorn, Torsten Schrade, S. Herres‐Pawlis, G. Meinel, W. Nagel","doi":"10.1109/PDP2018.2018.00059","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00059","url":null,"abstract":"Research data is increasingly important in order to gain insights from scientific data. To optimally foster this, the management of research data is required to be usable, customizable and fast. We enable this by building up the MASi research data management repository service, based on the KIT DM framework. The aim is on utilizing a single repository instance to serve multiple arbitrary community use cases. Due to their diverse data characteristics the performance of the MASi service has to be fitting across the different cases. We evaluate the performance along three initial heterogeneous use cases. Various aspects are investigated; First, the object insertion and query performance of the database along the object fill level. Second and third, the ingest and download performance of digital objects using real-life data sets. Highly favorable performance characteristics are shown.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131511032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels 稀疏矩阵核的内存延迟敏感性研究
N. Tanabe, Toshio Endo
Intel announced to launch a Xeon with high-latency main memory based on 3D Xpoint in 2018. This paper presents the performance evaluation of sparse matrix kernels on the future supercomputers with high-latency main memory such as 3D Xpoint. The authors propose a high throughput evaluation methodology for exhaustive experiments, which use the University of Florida sparse matrix collection and/or LIS (a Library of Iterative Solvers for linear systems) etc. Proposed methodology is very simple to use, highly flexible for environment and high-throughput. Latency sensitivity of SpMV is measured based on the proposed methodology with 208 sparse matrices and ten storage formats only in two days, which would take for about ten years by conventional simulators. We got several interesting knowledge about latency-sensitive kernels, sparse matrices, storage formats, and preconditioners, etc. We observed notable latency sensitivity in some applications, which are Graph500, HPCG and a part of preconditioners of iterative solvers. We found latency sensitivities of SpMV are high for larger matrices than the capacity of last level cache. This suggests main memory using 3D Xpoint must be combined with large DRAM cache.
英特尔宣布将于2018年推出基于3D Xpoint的高延迟主存至强处理器。本文介绍了稀疏矩阵核在3D Xpoint等具有高延迟主存的未来超级计算机上的性能评价。作者提出了一种用于穷举实验的高通量评估方法,该方法使用佛罗里达大学的稀疏矩阵集合和/或LIS(线性系统的迭代求解器库)等。所提出的方法使用简单,对环境具有高度灵活性和高通量。基于该方法,利用208个稀疏矩阵和10种存储格式,在2天内测量了SpMV的延迟灵敏度,而传统的模拟器需要10年左右的时间。我们获得了一些关于延迟敏感内核、稀疏矩阵、存储格式和前置条件等方面的有趣知识。我们在Graph500、HPCG和部分迭代求解器的前置条件下观察到明显的延迟敏感性。我们发现SpMV对比最后一级缓存容量更大的矩阵的延迟灵敏度更高。这表明使用3D Xpoint的主存储器必须与大型DRAM缓存相结合。
{"title":"Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels","authors":"N. Tanabe, Toshio Endo","doi":"10.1109/PDP2018.2018.00042","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00042","url":null,"abstract":"Intel announced to launch a Xeon with high-latency main memory based on 3D Xpoint in 2018. This paper presents the performance evaluation of sparse matrix kernels on the future supercomputers with high-latency main memory such as 3D Xpoint. The authors propose a high throughput evaluation methodology for exhaustive experiments, which use the University of Florida sparse matrix collection and/or LIS (a Library of Iterative Solvers for linear systems) etc. Proposed methodology is very simple to use, highly flexible for environment and high-throughput. Latency sensitivity of SpMV is measured based on the proposed methodology with 208 sparse matrices and ten storage formats only in two days, which would take for about ten years by conventional simulators. We got several interesting knowledge about latency-sensitive kernels, sparse matrices, storage formats, and preconditioners, etc. We observed notable latency sensitivity in some applications, which are Graph500, HPCG and a part of preconditioners of iterative solvers. We found latency sensitivities of SpMV are high for larger matrices than the capacity of last level cache. This suggests main memory using 3D Xpoint must be combined with large DRAM cache.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115045005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1