首页 > 最新文献

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)最新文献

英文 中文
Efficient and Fast Approximate Consensus with Epidemic Failure Detection at Extreme Scale 极端尺度下流行病失效检测的高效快速近似一致性
Amogh Katti, D. Lilja
This paper proposes a memory efficient failure detection and consensus algorithm, for fail-stop type process failures, based on epidemic protocols. It is suitable for extreme scale systems with reliable networks (no message loss) and high failure frequency. Communication time dominates the execution time at scale. The redundant failure detections and non-uniform information dissemination speed of epidemic algorithms make approximate epidemic-based consensus detection a useful way to trade communication overhead for accuracy. An approximate technique to the consensus detection is also proposed in this paper for faster consensus detection. Results show that the algorithm detects consensus correctly on failed processes with logarithmic scalability. The algorithm is tolerant to process failures both before and during the execution and the number of failures (occurring both before and during execution) have virtually no effect on the consensus detection time at scale. Comparison with similar deterministic consensus detection technique shows that the algorithm detects consensus at the same time with high probability. Further, benefits of the proposed approximate technique increase as system size increases. Compared to the non-approximate version, for a system size of 218 processes, the communication saved is 34% with accuracy loss of the order of 10^-4 in consensus detection.
针对故障停止型过程故障,提出了一种基于流行协议的高效内存故障检测和一致性算法。它适用于具有可靠网络(无消息丢失)和高故障频率的极端规模系统。通信时间在规模上支配着执行时间。流行病算法的冗余故障检测和不均匀的信息传播速度使得基于近似流行病的一致性检测成为一种以通信开销换取准确性的有效方法。为了更快地进行一致性检测,本文还提出了一种近似的一致性检测技术。结果表明,该算法对失败进程的一致性检测是正确的,具有对数可扩展性。该算法在执行之前和执行期间都能容忍进程故障,并且故障的数量(在执行之前和执行期间都发生)实际上对大规模的一致性检测时间没有影响。与同类确定性一致性检测技术的比较表明,该算法能够同时以高概率检测一致性。此外,所建议的近似技术的好处随着系统规模的增加而增加。与非近似版本相比,对于218个进程的系统大小,节省的通信量为34%,一致性检测的准确性损失为10^-4。
{"title":"Efficient and Fast Approximate Consensus with Epidemic Failure Detection at Extreme Scale","authors":"Amogh Katti, D. Lilja","doi":"10.1109/PDP2018.2018.00045","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00045","url":null,"abstract":"This paper proposes a memory efficient failure detection and consensus algorithm, for fail-stop type process failures, based on epidemic protocols. It is suitable for extreme scale systems with reliable networks (no message loss) and high failure frequency. Communication time dominates the execution time at scale. The redundant failure detections and non-uniform information dissemination speed of epidemic algorithms make approximate epidemic-based consensus detection a useful way to trade communication overhead for accuracy. An approximate technique to the consensus detection is also proposed in this paper for faster consensus detection. Results show that the algorithm detects consensus correctly on failed processes with logarithmic scalability. The algorithm is tolerant to process failures both before and during the execution and the number of failures (occurring both before and during execution) have virtually no effect on the consensus detection time at scale. Comparison with similar deterministic consensus detection technique shows that the algorithm detects consensus at the same time with high probability. Further, benefits of the proposed approximate technique increase as system size increases. Compared to the non-approximate version, for a system size of 218 processes, the communication saved is 34% with accuracy loss of the order of 10^-4 in consensus detection.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114509094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Integrating Learning, Optimization, and Prediction for Efficient Navigation of Swarms of Drones 无人机群高效导航的集成学习、优化和预测
Amin Majd, A. Ashraf, E. Troubitsyna, M. Daneshtalab
Swarms of drones are increasingly been used in a variety of monitoring and surveillance, search and rescue, and photography and filming tasks. However, despite the growing popularity of swarm-based applications of drones, there is still a lack of approaches to generate efficient drone routes while minimizing the risks of drone collisions. In this paper, we present a novel approach that integrates learning, optimization, and prediction for generating efficient and safe routes for swarms of drones. The proposed approach comprises three main components: (1) a high-performance dynamic evolutionary algorithm for optimizing drone routes, (2) a reinforcement learning algorithm for incorporating the feedback and runtime data about the system state, and (3) a prediction approach to predict the movement of drones and moving obstacles in the flying zone. We also present a parallel implementation of the proposed approach and evaluate it against two benchmarks. The results demonstrate that the proposed approach allows to significantly reduce the route lengths and computation overhead while producing efficient and safe routes.
成群结队的无人机越来越多地用于各种监视和监视,搜索和救援,以及摄影和拍摄任务。然而,尽管基于蜂群的无人机应用越来越受欢迎,但仍然缺乏在最小化无人机碰撞风险的同时生成高效无人机路线的方法。在本文中,我们提出了一种集成了学习、优化和预测的新方法,用于为无人机群生成高效安全的路线。该方法包括三个主要部分:(1)用于优化无人机路线的高性能动态进化算法;(2)用于整合系统状态反馈和运行时数据的强化学习算法;(3)用于预测无人机和飞行区内移动障碍物运动的预测方法。我们还提出了所建议方法的并行实现,并根据两个基准对其进行评估。结果表明,该方法可以显著减少路由长度和计算开销,同时产生高效安全的路由。
{"title":"Integrating Learning, Optimization, and Prediction for Efficient Navigation of Swarms of Drones","authors":"Amin Majd, A. Ashraf, E. Troubitsyna, M. Daneshtalab","doi":"10.1109/PDP2018.2018.00022","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00022","url":null,"abstract":"Swarms of drones are increasingly been used in a variety of monitoring and surveillance, search and rescue, and photography and filming tasks. However, despite the growing popularity of swarm-based applications of drones, there is still a lack of approaches to generate efficient drone routes while minimizing the risks of drone collisions. In this paper, we present a novel approach that integrates learning, optimization, and prediction for generating efficient and safe routes for swarms of drones. The proposed approach comprises three main components: (1) a high-performance dynamic evolutionary algorithm for optimizing drone routes, (2) a reinforcement learning algorithm for incorporating the feedback and runtime data about the system state, and (3) a prediction approach to predict the movement of drones and moving obstacles in the flying zone. We also present a parallel implementation of the proposed approach and evaluate it against two benchmarks. The results demonstrate that the proposed approach allows to significantly reduce the route lengths and computation overhead while producing efficient and safe routes.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122347929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Clustering Goes Big: CLUBS-P, an Algorithm for Unsupervised Clustering Around Centroids Tailored For Big Data Applications 聚类变大:CLUBS-P,一个为大数据应用量身定制的围绕质心的无监督聚类算法
M. Ianni, E. Masciari, G. Mazzeo, C. Zaniolo
The need to support advanced analytics on Big Data is driving data scientist' interest toward massively parallel distributed systems and software platforms, such as Map- Reduce and Spark, that make possible their scalable utilization. However, when complex data mining algorithms are required, their fully scalable deployment on such platforms faces a number of technical challenges that grow with the complexity of the algorithms involved. Thus algorithms, that were originally designed for a sequential nature, must often be redesigned in order to effectively use the distributed computational resources. In this paper, we explore these problems, and then propose a solution which has proven to be very effective on the complex hierarchical clustering algorithm CLUBS+. We present a parallel version of CLUBS+ named CLUBS-P with an ad-hoc implementation based on message passing: CLUBS-MP.
支持大数据高级分析的需求促使数据科学家对大规模并行分布式系统和软件平台(如Map- Reduce和Spark)产生兴趣,这些系统和软件平台使大数据的可扩展利用成为可能。然而,当需要复杂的数据挖掘算法时,它们在这样的平台上的完全可伸缩部署面临着许多技术挑战,这些挑战随着所涉及算法的复杂性而增长。因此,为了有效地使用分布式计算资源,必须经常重新设计最初设计用于顺序性质的算法。在本文中,我们对这些问题进行了探讨,并提出了一种解决方案,该方案在复杂的分层聚类算法CLUBS+上被证明是非常有效的。我们提出了CLUBS+的一个并行版本,名为CLUBS- p,它具有基于消息传递的特别实现:CLUBS- mp。
{"title":"Clustering Goes Big: CLUBS-P, an Algorithm for Unsupervised Clustering Around Centroids Tailored For Big Data Applications","authors":"M. Ianni, E. Masciari, G. Mazzeo, C. Zaniolo","doi":"10.1109/PDP2018.2018.00094","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00094","url":null,"abstract":"The need to support advanced analytics on Big Data is driving data scientist' interest toward massively parallel distributed systems and software platforms, such as Map- Reduce and Spark, that make possible their scalable utilization. However, when complex data mining algorithms are required, their fully scalable deployment on such platforms faces a number of technical challenges that grow with the complexity of the algorithms involved. Thus algorithms, that were originally designed for a sequential nature, must often be redesigned in order to effectively use the distributed computational resources. In this paper, we explore these problems, and then propose a solution which has proven to be very effective on the complex hierarchical clustering algorithm CLUBS+. We present a parallel version of CLUBS+ named CLUBS-P with an ad-hoc implementation based on message passing: CLUBS-MP.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123006456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Storage for Advanced Scientific Use-Cases and Beyond 用于高级科学用例及其他用途的存储
P. Millar, Olufemi Adeyemi, G. Behrmann, P. Fuhrmann, V. Garonne, Dmitry Litvinsev, T. Mkrtchyan, A. Rossi, M. Sahakyan, Jürgen Starek
The dCache project provides open-source storage software deployed internationally to satisfy ever more demanding scientific storage requirements. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, through wide access and easy integration with existing systems. In this paper, we describe some of the recent features that facilitate the use of storage to maximise the gain from stored data, including quality-of-service management, heterogeneous systems — both through integrated tertiary storage support and geographical locality — the parallel NFS (pNFS) extension, and innovative delegated authorisation schemes.
dCache项目提供国际部署的开源存储软件,以满足越来越苛刻的科学存储需求。它的多方面方法提供了一种集成的方式,支持使用相同存储的不同用例,从高吞吐量数据摄取,到广泛访问和与现有系统的轻松集成。在本文中,我们描述了一些有助于使用存储以最大限度地从存储数据中获益的最新特性,包括服务质量管理、异构系统(通过集成的三级存储支持和地理位置)、并行NFS (pNFS)扩展和创新的委托授权方案。
{"title":"Storage for Advanced Scientific Use-Cases and Beyond","authors":"P. Millar, Olufemi Adeyemi, G. Behrmann, P. Fuhrmann, V. Garonne, Dmitry Litvinsev, T. Mkrtchyan, A. Rossi, M. Sahakyan, Jürgen Starek","doi":"10.1109/PDP2018.2018.00109","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00109","url":null,"abstract":"The dCache project provides open-source storage software deployed internationally to satisfy ever more demanding scientific storage requirements. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, through wide access and easy integration with existing systems. In this paper, we describe some of the recent features that facilitate the use of storage to maximise the gain from stored data, including quality-of-service management, heterogeneous systems — both through integrated tertiary storage support and geographical locality — the parallel NFS (pNFS) extension, and innovative delegated authorisation schemes.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121680676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Leveraging Compute Clusters for Large-Scale Parametric Screens of Reaction-Diffusion Systems 利用计算集群进行反应扩散系统的大规模参数化筛选
Md. Shahriar Karim, H. Othmer, David M. Umulis
Reaction-diffusion (RD) models are widely used to study the spatio-temporal evolution of pattern formation during development. Nonlinear RD models are often analytically intractable, and require numerical solution methods. Interrogation of RD models for a large physiological range of parameters covers many orders of magnitude, establishing situations where solutions are stiff and solvers fail to provide accurate results to the time-dependent problem. The spatial dependence of these parameters, and the nonlinearity of the underlying dynamics, impose additional challenges. We developed an efficient approach for simulating stiff RD models of pattern formation and we used supercomputer clusters to carry out a large screen of spatially varying parameters. The proposed approach generated data for screening of RD systems within a reasonable amount of time (a few days), which scales down further if additional cluster nodes are available. The approaches outlined herein are applicable to any systems biology problem requiring numerical approximation of RD equations with spatially non-uniform properties and stiff nonlinear reactions.
反应扩散(RD)模型被广泛用于研究格局形成过程中的时空演化。非线性RD模型通常难以解析,需要数值求解方法。对大生理参数范围的RD模型的询问涵盖了许多数量级,建立了解决方案僵硬且求解器无法为时间相关问题提供准确结果的情况。这些参数的空间依赖性,以及潜在动力学的非线性,带来了额外的挑战。我们开发了一种有效的方法来模拟模式形成的刚性RD模型,我们使用超级计算机集群来执行空间变化参数的大屏幕。拟议的方法在合理的时间(几天)内生成用于筛选RD系统的数据,如果有额外的集群节点可用,则可以进一步缩减。本文概述的方法适用于任何需要对具有空间非均匀性质和刚性非线性反应的RD方程进行数值逼近的系统生物学问题。
{"title":"Leveraging Compute Clusters for Large-Scale Parametric Screens of Reaction-Diffusion Systems","authors":"Md. Shahriar Karim, H. Othmer, David M. Umulis","doi":"10.1109/PDP2018.2018.00116","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00116","url":null,"abstract":"Reaction-diffusion (RD) models are widely used to study the spatio-temporal evolution of pattern formation during development. Nonlinear RD models are often analytically intractable, and require numerical solution methods. Interrogation of RD models for a large physiological range of parameters covers many orders of magnitude, establishing situations where solutions are stiff and solvers fail to provide accurate results to the time-dependent problem. The spatial dependence of these parameters, and the nonlinearity of the underlying dynamics, impose additional challenges. We developed an efficient approach for simulating stiff RD models of pattern formation and we used supercomputer clusters to carry out a large screen of spatially varying parameters. The proposed approach generated data for screening of RD systems within a reasonable amount of time (a few days), which scales down further if additional cluster nodes are available. The approaches outlined herein are applicable to any systems biology problem requiring numerical approximation of RD equations with spatially non-uniform properties and stiff nonlinear reactions.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116219934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Simulation of Sinoatrial Node Cells Synchronization 窦房结细胞同步的并行模拟
Aurelio Nicolas Mata, N. P. Castellanos-Abrego, G. Román-Alonso, M. Castro-García, G. Garza, J. R. Godínez-Fernández
The sinoatrial node (SAN) has the function of initiating a depolarizing wave that runs throughout the heart. This wave produces the muscular contraction necessary to blood pumping in animals. In recent years different works have been proposed to simulate the electric potential behaviour of a single sinoatrial cell (SANC) and groups of cells, hence a set of differential equations needs to be solved for each microsecond of simulation. An important drawback comes up when the synchronization of millions of SANCs is required involving a huge processing time. Since the simulation of the behavior of a set of cells is an open research topic, it is important to propose efficient tools to reduce response times; unfortunately, because the complexity of the existent models of SANC, very scarce work has been done to this end. This paper proposes three parallel algorithms to simulate the synchronization of a set of SANCs based on the model of Severi (2012). The proposed approaches are built using OpenMP, MPI, and CUDA, in order to compare the benefits given by different computing platforms. We found that all parallel versions perform better when defining a cell per processing unit; however the CUDA version gives the best results in scalability and performance.
窦房结(SAN)具有启动贯穿心脏的去极化波的功能。这种波产生动物血液泵送所必需的肌肉收缩。近年来,人们提出了不同的工作来模拟单个窦房细胞(SANC)和细胞群的电位行为,因此每微秒模拟都需要求解一组微分方程。当需要对数百万个sanc进行同步,涉及到大量的处理时间时,就会出现一个重要的缺点。由于模拟一组细胞的行为是一个开放的研究课题,提出有效的工具来减少响应时间是很重要的;不幸的是,由于现有的SANC模型的复杂性,在这方面做的工作非常少。本文基于Severi(2012)的模型,提出了三种并行算法来模拟一组sanc的同步。所提出的方法是使用OpenMP、MPI和CUDA构建的,以便比较不同计算平台所带来的好处。我们发现,当每个处理单元定义一个单元时,所有并行版本的性能都更好;然而,CUDA版本在可扩展性和性能方面提供了最好的结果。
{"title":"Parallel Simulation of Sinoatrial Node Cells Synchronization","authors":"Aurelio Nicolas Mata, N. P. Castellanos-Abrego, G. Román-Alonso, M. Castro-García, G. Garza, J. R. Godínez-Fernández","doi":"10.1109/PDP2018.2018.00025","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00025","url":null,"abstract":"The sinoatrial node (SAN) has the function of initiating a depolarizing wave that runs throughout the heart. This wave produces the muscular contraction necessary to blood pumping in animals. In recent years different works have been proposed to simulate the electric potential behaviour of a single sinoatrial cell (SANC) and groups of cells, hence a set of differential equations needs to be solved for each microsecond of simulation. An important drawback comes up when the synchronization of millions of SANCs is required involving a huge processing time. Since the simulation of the behavior of a set of cells is an open research topic, it is important to propose efficient tools to reduce response times; unfortunately, because the complexity of the existent models of SANC, very scarce work has been done to this end. This paper proposes three parallel algorithms to simulate the synchronization of a set of SANCs based on the model of Severi (2012). The proposed approaches are built using OpenMP, MPI, and CUDA, in order to compare the benefits given by different computing platforms. We found that all parallel versions perform better when defining a cell per processing unit; however the CUDA version gives the best results in scalability and performance.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128425985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm 求解现代gpu中的稀疏三角形线性系统:一种无同步算法
Ernesto Dufrechu, P. Ezzatti
Sparse triangular linear systems are ubiquitous in a wide range of science and engineering fields, and represent one of the most important building blocks of Sparse Numerical Lineal Algebra methods. For this reason, their parallel solution has been subject of exhaustive study, and efficient implementations of this kernel can be found for almost every hardware platform. However, the strong data dependencies that serialize a great deal of the execution and the load imbalance inherent to the triangular structure poses serious difficulties for its parallel performance, specially in the context of massively- parallel processors such as GPUs. To this day, the most widespread GPU implementation of this kernel is the one distributed in NVIDIA CUSPARSE library, which relies on a preprocessing stage to determine the parallel execution schedule. Although the solution phase is highly efficient, this strategy pays the cost of constant synchronizations with the CPU. In this work, we present a synchronization-free GPU al- gorithm to solve sparse triangular linear systems for the CSR format. The experimental evaluation shows performance improvements over CUSPARSE and a recently proposed synchronization-free method for the CSC matrix format.
稀疏三角形线性系统广泛应用于科学和工程领域,是稀疏数值线性代数方法的重要组成部分之一。由于这个原因,他们的并行解决方案一直是详尽研究的主题,并且可以在几乎所有硬件平台上找到该内核的有效实现。然而,三角结构固有的数据依赖性和负载不平衡性给其并行性能带来了严重的困难,特别是在gpu等大规模并行处理器的环境中。到目前为止,该内核的最广泛的GPU实现是分布在NVIDIA CUSPARSE库中的一个,它依赖于预处理阶段来确定并行执行时间表。尽管解决方案阶段非常高效,但此策略要付出与CPU持续同步的代价。在这项工作中,我们提出了一种无需同步的GPU算法来求解CSR格式的稀疏三角形线性系统。实验评估表明,与CUSPARSE和最近提出的CSC矩阵格式的无同步方法相比,性能有所提高。
{"title":"Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm","authors":"Ernesto Dufrechu, P. Ezzatti","doi":"10.1109/PDP2018.2018.00034","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00034","url":null,"abstract":"Sparse triangular linear systems are ubiquitous in a wide range of science and engineering fields, and represent one of the most important building blocks of Sparse Numerical Lineal Algebra methods. For this reason, their parallel solution has been subject of exhaustive study, and efficient implementations of this kernel can be found for almost every hardware platform. However, the strong data dependencies that serialize a great deal of the execution and the load imbalance inherent to the triangular structure poses serious difficulties for its parallel performance, specially in the context of massively- parallel processors such as GPUs. To this day, the most widespread GPU implementation of this kernel is the one distributed in NVIDIA CUSPARSE library, which relies on a preprocessing stage to determine the parallel execution schedule. Although the solution phase is highly efficient, this strategy pays the cost of constant synchronizations with the CPU. In this work, we present a synchronization-free GPU al- gorithm to solve sparse triangular linear systems for the CSR format. The experimental evaluation shows performance improvements over CUSPARSE and a recently proposed synchronization-free method for the CSC matrix format.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132147320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Low-Power Storage Bricks and Bioinformatics on Systems-On-Chip 低功耗存储模块和片上系统的生物信息学
L. Morganti, D. Cesini, Elena Corni, Luca Lama, Carmelo Pellegrino, I. Merelli, D. D'Agostino
Low-power Systems on Chip (SoCs) derived from the embedded and mobile market can be profitably used to execute scientific workloads traditionally designed for power-hungry clusters, saving energy, gaining portability and reducing infrastructural costs and sizes. We investigate the possibility of using SoCs as storage bricks of a BeeGFS filesystem in the perspective of energy-efficient storage solutions supporting scientific computing. Then, we consider a use case from metagenomics analysis and show how the large amount of genome sequencing information streamed by portable sequencing devices could be managed by low-power SoCs making use of an underlying BeeGFS filesystem.
源自嵌入式和移动市场的低功耗片上系统(soc)可用于执行传统上为耗电集群设计的科学工作负载,从而节省能源,获得可移植性并降低基础设施成本和尺寸。我们从支持科学计算的节能存储解决方案的角度研究了使用soc作为BeeGFS文件系统存储块的可能性。然后,我们考虑了宏基因组学分析的一个用例,并展示了如何通过使用底层BeeGFS文件系统的低功耗soc来管理便携式测序设备传输的大量基因组测序信息。
{"title":"Low-Power Storage Bricks and Bioinformatics on Systems-On-Chip","authors":"L. Morganti, D. Cesini, Elena Corni, Luca Lama, Carmelo Pellegrino, I. Merelli, D. D'Agostino","doi":"10.1109/PDP2018.2018.00106","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00106","url":null,"abstract":"Low-power Systems on Chip (SoCs) derived from the embedded and mobile market can be profitably used to execute scientific workloads traditionally designed for power-hungry clusters, saving energy, gaining portability and reducing infrastructural costs and sizes. We investigate the possibility of using SoCs as storage bricks of a BeeGFS filesystem in the perspective of energy-efficient storage solutions supporting scientific computing. Then, we consider a use case from metagenomics analysis and show how the large amount of genome sequencing information streamed by portable sequencing devices could be managed by low-power SoCs making use of an underlying BeeGFS filesystem.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126216077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Adaptive Cloud-Based IoT Back-end Architecture and Its Applications 一种基于云的自适应物联网后端架构及其应用
A. Marosi, Attila Farkas, R. Lovas
Internet of Things (IoT) is playing increasingly more fundamental role in wide range of sectors, including industry, agriculture, health care, and other services. In many cases, cloud computing serves as an elastic and efficient paradigm for implementing IoT back-ends. With the emerging lightweight software container technologies, the feasible approaches and design options for such IoT back-ends have been significantly enriched. In our paper we present the evolution of an IoT back-end, which is responsible for collecting (among others) meteorological, image and soil data from cultivated fields in order to enable precision farming. The different versions, namely the cloud VM-based and the Docker containerized variants, provide highly scalable and vendor independent (cloud provider agnostic) solutions, therefore they can form a robust and adaptive framework for further pilot applications areas, e.g. Connected Cars and Industry 4.0, as the presented benchmarks illustrate the throughput and other parameters of the current implementation in the paper.
物联网(IoT)在工业、农业、医疗保健和其他服务等广泛领域发挥着越来越重要的作用。在许多情况下,云计算可以作为实现物联网后端的弹性和高效范例。随着轻量级软件容器技术的出现,这种物联网后端的可行方法和设计选项得到了极大的丰富。在我们的论文中,我们介绍了物联网后端的发展,该后端负责从耕地收集气象、图像和土壤数据,以实现精准农业。不同的版本,即基于云虚拟机和Docker容器化的版本,提供了高度可扩展和供应商独立(云提供商无关)的解决方案,因此它们可以为进一步的试点应用领域,如互联汽车和工业4.0,形成一个强大的和自适应的框架,正如本文中提出的基准说明了吞吐量和当前实现的其他参数。
{"title":"An Adaptive Cloud-Based IoT Back-end Architecture and Its Applications","authors":"A. Marosi, Attila Farkas, R. Lovas","doi":"10.1109/PDP2018.2018.00087","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00087","url":null,"abstract":"Internet of Things (IoT) is playing increasingly more fundamental role in wide range of sectors, including industry, agriculture, health care, and other services. In many cases, cloud computing serves as an elastic and efficient paradigm for implementing IoT back-ends. With the emerging lightweight software container technologies, the feasible approaches and design options for such IoT back-ends have been significantly enriched. In our paper we present the evolution of an IoT back-end, which is responsible for collecting (among others) meteorological, image and soil data from cultivated fields in order to enable precision farming. The different versions, namely the cloud VM-based and the Docker containerized variants, provide highly scalable and vendor independent (cloud provider agnostic) solutions, therefore they can form a robust and adaptive framework for further pilot applications areas, e.g. Connected Cars and Industry 4.0, as the presented benchmarks illustrate the throughput and other parameters of the current implementation in the paper.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121777029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
pWebDAV: A Multi-Tier Storage System pWebDAV:一个多层存储系统
Christos Filippidis, Y. Cotronis
Experiments using the Large Hadron Collider (LHC) currently generate tens of petabytes of reduced data per year, observational and simulation data in the climate domain is expected to reach eXabytes by 2021, and light source experiments are expected to generate hundreds of terabytes per day. At such extreme scale, the substantial amount of concurrency can cause critical contention issue of the I/O system. This study introduces pWebDAV as a heterogeneous, multi-tier storage system. pWebDAV proposes a dynamically coordinated I/O architecture offering overall data flow solutions (remote-local access). The fundamental idea is to implement, for each data transfer, I/O policies on the fly. pWebDAV controls all I/O nodes, participating in the data transfer, directly regardless of the tier. pWebDAV approach can fully utilize the provided I/O & network resources and is able to minimize disk and network contention. The focus in this study is the Metadata node scalability performance.
使用大型强子对撞机(LHC)的实验目前每年产生数十pb的精简数据,预计到2021年气候领域的观测和模拟数据将达到eXabytes,光源实验预计每天产生数百tb的数据。在如此极端的规模下,大量的并发性可能导致I/O系统出现严重的争用问题。本研究将pWebDAV作为一个异构的、多层的存储系统。pWebDAV提出了一种动态协调的I/O架构,提供了整体数据流解决方案(远程本地访问)。其基本思想是为每次数据传输动态地实现I/O策略。pWebDAV控制所有参与数据传输的I/O节点,直接与层无关。pWebDAV方法可以充分利用所提供的I/O和网络资源,并且能够最大限度地减少磁盘和网络争用。本研究的重点是元数据节点的可伸缩性性能。
{"title":"pWebDAV: A Multi-Tier Storage System","authors":"Christos Filippidis, Y. Cotronis","doi":"10.1109/PDP2018.2018.00108","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00108","url":null,"abstract":"Experiments using the Large Hadron Collider (LHC) currently generate tens of petabytes of reduced data per year, observational and simulation data in the climate domain is expected to reach eXabytes by 2021, and light source experiments are expected to generate hundreds of terabytes per day. At such extreme scale, the substantial amount of concurrency can cause critical contention issue of the I/O system. This study introduces pWebDAV as a heterogeneous, multi-tier storage system. pWebDAV proposes a dynamically coordinated I/O architecture offering overall data flow solutions (remote-local access). The fundamental idea is to implement, for each data transfer, I/O policies on the fly. pWebDAV controls all I/O nodes, participating in the data transfer, directly regardless of the tier. pWebDAV approach can fully utilize the provided I/O & network resources and is able to minimize disk and network contention. The focus in this study is the Metadata node scalability performance.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130573824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1