首页 > 最新文献

2014 International Conference on High Performance Computing & Simulation (HPCS)最新文献

英文 中文
Accelerating outlier detection with intra- and inter-node parallelism 利用节点内和节点间并行性加速异常点检测
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903723
F. Angiulli, S. Basta, Stefano Lodi, Claudio Sartori
Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms, intra-cpu parallelism of high-performance architectures, network level parallelism. In this paper we propose an outlier detection algorithm able to exploit the internal parallelism of a GPU and the external parallelism of a cluster of GPU. The algorithm is the evolution of our previous solutions which considered either GPU or network level parallelism. We discuss a set of large scale experiments executed in a supercomputing facility and show the speedup obtained with varying number of nodes.
异常值检测是一种数据挖掘任务,包括发现与其他数据有很大偏差的观测值,并且具有许多重要的实际应用。然而,在非常大的数据集中进行离群值检测在计算上是非常苛刻的,并且可以详细阐述的数据的大小限制通过混合三种成分而大大推进:高效算法,高性能架构的cpu内并行性,网络级并行性。本文提出了一种能够利用GPU内部并行性和GPU集群外部并行性的离群点检测算法。该算法是我们之前考虑GPU或网络级并行性的解决方案的进化。我们讨论了一组在超级计算设施中执行的大规模实验,并展示了不同节点数量所获得的加速。
{"title":"Accelerating outlier detection with intra- and inter-node parallelism","authors":"F. Angiulli, S. Basta, Stefano Lodi, Claudio Sartori","doi":"10.1109/HPCSim.2014.6903723","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903723","url":null,"abstract":"Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms, intra-cpu parallelism of high-performance architectures, network level parallelism. In this paper we propose an outlier detection algorithm able to exploit the internal parallelism of a GPU and the external parallelism of a cluster of GPU. The algorithm is the evolution of our previous solutions which considered either GPU or network level parallelism. We discuss a set of large scale experiments executed in a supercomputing facility and show the speedup obtained with varying number of nodes.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"476-483"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82204292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MIPT: Rapid exploration and evaluation for migrating sequential algorithms to multiprocessing systems with multi-port memories MIPT:将顺序算法迁移到具有多端口存储器的多处理系统的快速探索和评估
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903767
Gorker Alp Malazgirt, A. Yurdakul, S. Niar
Research has shown that the memory load/store instructions consume an important part in execution time and energy consumption. Extracting available parallelism at different granularity has been an important approach for designing next generation highly parallel systems. In this work, we present MIPT, an architecture exploration framework that leverages instruction parallelism of memory and ALU operations from a sequential algorithm's execution trace. MIPT heuristics recommend memory port sizes and issue slot sizes for memory and ALU operations. Its custom simulator simulates and evaluates the recommended parallel version of the execution trace for measuring performance improvements versus dual port memory. MIPT's architecture exploration criteria is to improve performance by utilizing systems with multi-port memories and multi-issue ALUs. There exists design exploration tools such as Multi2Sim and Trimaran. These simulators offer customization of multi-port memory architectures but designers' initial starting points are usually unclear. Thus, MIPT can suggest initial starting point for customization in those design exploration systems. In addition, given same application with two different implementations, it is possible to compare their execution time by the MIPT simulator.
研究表明,内存加载/存储指令在执行时间和能量消耗中占有重要的比重。提取不同粒度的可用并行性已成为设计下一代高度并行系统的重要途径。在这项工作中,我们提出了MIPT,这是一个架构探索框架,它利用了内存的指令并行性和顺序算法执行跟踪中的ALU操作。MIPT启发式方法为内存和ALU操作推荐内存端口大小和问题插槽大小。它的定制模拟器模拟并评估推荐的并行版本的执行跟踪,以衡量相对于双端口内存的性能改进。MIPT的架构探索标准是通过利用具有多端口存储器和多问题alu的系统来提高性能。现有的设计探索工具如Multi2Sim和Trimaran。这些模拟器提供多端口内存架构的定制,但设计师的初始起点通常不清楚。因此,MIPT可以建议在这些设计探索系统中进行定制的初始起点。此外,对于具有两种不同实现的相同应用程序,可以通过MIPT模拟器比较它们的执行时间。
{"title":"MIPT: Rapid exploration and evaluation for migrating sequential algorithms to multiprocessing systems with multi-port memories","authors":"Gorker Alp Malazgirt, A. Yurdakul, S. Niar","doi":"10.1109/HPCSim.2014.6903767","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903767","url":null,"abstract":"Research has shown that the memory load/store instructions consume an important part in execution time and energy consumption. Extracting available parallelism at different granularity has been an important approach for designing next generation highly parallel systems. In this work, we present MIPT, an architecture exploration framework that leverages instruction parallelism of memory and ALU operations from a sequential algorithm's execution trace. MIPT heuristics recommend memory port sizes and issue slot sizes for memory and ALU operations. Its custom simulator simulates and evaluates the recommended parallel version of the execution trace for measuring performance improvements versus dual port memory. MIPT's architecture exploration criteria is to improve performance by utilizing systems with multi-port memories and multi-issue ALUs. There exists design exploration tools such as Multi2Sim and Trimaran. These simulators offer customization of multi-port memory architectures but designers' initial starting points are usually unclear. Thus, MIPT can suggest initial starting point for customization in those design exploration systems. In addition, given same application with two different implementations, it is possible to compare their execution time by the MIPT simulator.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"776-783"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85507415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An automated infrastructure to support high-throughput bioinformatics 支持高通量生物信息学的自动化基础设施
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903742
G. Cuccuru, Simone Leo, L. Lianas, Michele Muggiri, Andrea Pinna, L. Pireddu, P. Uva, A. Angius, G. Fotia, G. Zanetti
The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non-technical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.
受大数据现象影响的领域不断增加,无论是在科学领域还是在工业领域,高通量DNA测序仪都是最大规模的数据生产者之一。然而,构建能够跟上如此高生产率的分析框架只是问题的一部分:当前的挑战包括处理铰接的数据存储库,其中对象通过多个关系连接,管理复杂的处理管道,其中每个步骤依赖于大量配置参数,并确保非技术人员的可重复性、错误控制和可用性。在这里,我们描述了一个自动化的基础设施,用于在分析CRS4下一代测序设备产生的数据的背景下解决上述问题。该系统将开源工具集成到一个框架中,该框架可以处理从原始测序器输出到主要分析结果的整个数据转换过程。
{"title":"An automated infrastructure to support high-throughput bioinformatics","authors":"G. Cuccuru, Simone Leo, L. Lianas, Michele Muggiri, Andrea Pinna, L. Pireddu, P. Uva, A. Angius, G. Fotia, G. Zanetti","doi":"10.1109/HPCSim.2014.6903742","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903742","url":null,"abstract":"The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non-technical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"21 1","pages":"600-607"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86619025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi Graph500在Intel Xeon Phi处理器上的向量化潜力评估
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903668
Milan Stanic, Oscar Palomar, Ivan Ratković, M. Duric, O. Unsal, A. Cristal, M. Valero
Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.
Graph500是一个用于高性能计算的数据密集型应用程序,它是一个越来越重要的工作负载,因为图是大多数分析应用程序的核心部分。到目前为止,还没有研究Graph500是否适合矢量化的工作,这主要是由于缺乏用于不规则内存访问的矢量内存指令。Xeon Phi是英特尔最近发布的一款大规模并行处理器,具有512位宽矢量单元和矢量散射/收集指令等新功能。因此,Xeon Phi处理器允许Graph500与矢量化相结合的更有效的并行化。本文对Graph500进行了向量化,并分析了向量化和预取对Xeon Phi处理器的影响。我们还展示了并行化、向量化和预取的组合,与不利用Xeon Phi的矢量功能的预取并行版本相比,其速度提高了27%。
{"title":"Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi","authors":"Milan Stanic, Oscar Palomar, Ivan Ratković, M. Duric, O. Unsal, A. Cristal, M. Valero","doi":"10.1109/HPCSim.2014.6903668","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903668","url":null,"abstract":"Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"29 1","pages":"47-54"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86678215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
AIDD: A novel generic attack modeling approach 一种新的通用攻击建模方法
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903738
Samih Souissi, A. Serhrouchni
In recent years, information systems have become more diverse and complex making them a privileged target of network and computer attacks. These attacks have increased tremendously and turned out to be more sophisticated and evolving in an unpredictable manner. This work presents an attack model called AIDD (Attacks Identification Description and Defense). It offers a generic attack modeling to classify, help identify and defend against computer and network attacks. Our approach takes into account several attack properties in order to simplify attack handling and aggregate defense mechanisms. The originality in our work is that it introduces a target centric classification which increases the level of abstraction in order to offer a generic model to describe complex attacks.
近年来,信息系统变得更加多样化和复杂,使其成为网络和计算机攻击的特权目标。这些攻击急剧增加,变得更加复杂,并以不可预测的方式演变。本文提出了一种称为AIDD(攻击识别描述和防御)的攻击模型。它提供了一个通用的攻击模型来分类,帮助识别和防御计算机和网络攻击。我们的方法考虑了几种攻击属性,以简化攻击处理和聚合防御机制。我们工作的独创性在于它引入了以目标为中心的分类,从而提高了抽象级别,从而提供了描述复杂攻击的通用模型。
{"title":"AIDD: A novel generic attack modeling approach","authors":"Samih Souissi, A. Serhrouchni","doi":"10.1109/HPCSim.2014.6903738","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903738","url":null,"abstract":"In recent years, information systems have become more diverse and complex making them a privileged target of network and computer attacks. These attacks have increased tremendously and turned out to be more sophisticated and evolving in an unpredictable manner. This work presents an attack model called AIDD (Attacks Identification Description and Defense). It offers a generic attack modeling to classify, help identify and defend against computer and network attacks. Our approach takes into account several attack properties in order to simplify attack handling and aggregate defense mechanisms. The originality in our work is that it introduces a target centric classification which increases the level of abstraction in order to offer a generic model to describe complex attacks.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"3 1","pages":"580-583"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79194614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Performance evaluation of a SIP-based constrained peer-to-peer overlay 基于sip的约束点对点覆盖的性能评估
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903717
S. Cirani, Luca Davoli, Marco Picone, L. Veltri
In recent years, due to the development and innovation in hardware and software, the scenario of a global worldwide network capable of interconnecting both traditional nodes and new Smart Objects (the Internet of Things) is coming true. The Internet of Things (IoT) will involve billions of communicating heterogeneous devices, using different protocols in order to enable new forms of interaction between things and people. In this context, due to scalability, fault-tolerance, and self-configuration requirements, peer-to-peer(P2P) architectures are very appealing in many large-scale IoT scenarios. However, due to memory, processing, and power limitations of constrained devices, the use of specific signaling protocols for the maintenance of the P2P overlay is a critical point. In this paper we present a performance evaluation of a real DHT-based P2P overlay in order to understand the benefits in terms of bandwidth consumption and transmitted/received data when a constrained SIP-based protocol, denoted as CoSIP, is used as P2P signaling protocol.
近年来,由于硬件和软件的发展和创新,传统节点和新型智能对象(物联网)相互连接的全球网络的场景正在成为现实。物联网(IoT)将涉及数十亿通信异构设备,使用不同的协议,以实现物与人之间新形式的交互。在这种情况下,由于可扩展性、容错性和自配置需求,点对点(P2P)架构在许多大规模物联网场景中非常有吸引力。然而,由于受约束设备的内存、处理和功率限制,使用特定的信令协议来维护P2P覆盖是一个关键点。在本文中,我们提出了一个真实的基于dhp的P2P覆盖层的性能评估,以便了解当使用受限的基于sip的协议(表示为CoSIP)作为P2P信令协议时,在带宽消耗和传输/接收数据方面的好处。
{"title":"Performance evaluation of a SIP-based constrained peer-to-peer overlay","authors":"S. Cirani, Luca Davoli, Marco Picone, L. Veltri","doi":"10.1109/HPCSim.2014.6903717","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903717","url":null,"abstract":"In recent years, due to the development and innovation in hardware and software, the scenario of a global worldwide network capable of interconnecting both traditional nodes and new Smart Objects (the Internet of Things) is coming true. The Internet of Things (IoT) will involve billions of communicating heterogeneous devices, using different protocols in order to enable new forms of interaction between things and people. In this context, due to scalability, fault-tolerance, and self-configuration requirements, peer-to-peer(P2P) architectures are very appealing in many large-scale IoT scenarios. However, due to memory, processing, and power limitations of constrained devices, the use of specific signaling protocols for the maintenance of the P2P overlay is a critical point. In this paper we present a performance evaluation of a real DHT-based P2P overlay in order to understand the benefits in terms of bandwidth consumption and transmitted/received data when a constrained SIP-based protocol, denoted as CoSIP, is used as P2P signaling protocol.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"432-435"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88914621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Analyzing and modeling BitTorrent: A game theory approach 分析和建模BitTorrent:一个博弈论的方法
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903718
Farag Azzedin, Mohammed Onimisi Yahaya
Although BitTorrent is gaining popularity and continuous usage as a file sharing system, it is bedeviled with some challenges. Developers and researchers alike are exerting efforts to address such challenges. Free riders exploitation of BitTorrent is a widely acknowledged problem. In particular, free riders misuse BitTorrent through both optimistic and regular unchoke. In this article, we use game theory to model the BitTorrent choking algorithm and conduct extensive performance evaluation experiments to assess its performance as compared with the original BitTorrent choking algorithm.
虽然BitTorrent作为一个文件共享系统越来越受欢迎和持续使用,但它也受到一些挑战的困扰。开发人员和研究人员都在努力解决这些挑战。免费利用BitTorrent是一个公认的问题。特别是,免费乘车者通过乐观和常规的疏通来滥用BitTorrent。在本文中,我们利用博弈论对BitTorrent阻塞算法进行建模,并进行了大量的性能评估实验,以评估其与原始BitTorrent阻塞算法的性能。
{"title":"Analyzing and modeling BitTorrent: A game theory approach","authors":"Farag Azzedin, Mohammed Onimisi Yahaya","doi":"10.1109/HPCSim.2014.6903718","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903718","url":null,"abstract":"Although BitTorrent is gaining popularity and continuous usage as a file sharing system, it is bedeviled with some challenges. Developers and researchers alike are exerting efforts to address such challenges. Free riders exploitation of BitTorrent is a widely acknowledged problem. In particular, free riders misuse BitTorrent through both optimistic and regular unchoke. In this article, we use game theory to model the BitTorrent choking algorithm and conduct extensive performance evaluation experiments to assess its performance as compared with the original BitTorrent choking algorithm.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"37 1","pages":"436-443"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86372893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The roofline model for oceanic climate applications 海洋气候应用的屋顶线模式
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903762
I. Epicoco, S. Mocavero, F. Macchia, G. Aloisio
The present work describes the analysis and optimisation of the PELAGOS025 configuration based on the coupling of the NEMO physic component of the ocean dynamics and the BFM (Biogeochemical Flux Model), a sophisticated biogeochemical model that can simulate both pelagic and benthic processes. The methodology here followed is characterised by the performance analysis of the original parallel code, in terms of strong scalability, the definition of the bottlenecks limiting the scalability when the number of processes increases, the analysis of the features of the most computational intensive kernels through the Roofline model which provides an insightful visual performance model for multicore architectures and which allows to measure and compare the performance of one or more computational kernels run on different hardware architectures.
目前的工作描述了基于海洋动力学的NEMO物理成分和BFM(生物地球化学通量模型)耦合的PELAGOS025配置的分析和优化,BFM(生物地球化学通量模型)是一种复杂的生物地球化学模型,可以模拟远洋和底栖过程。这里遵循的方法的特点是对原始并行代码的性能分析,在强大的可扩展性方面,当进程数量增加时限制可扩展性的瓶颈的定义,通过rooline模型分析最计算密集型内核的特征,该模型为多核架构提供了深刻的可视化性能模型,并允许测量和比较在不同硬件架构上运行的一个或多个计算内核的性能。
{"title":"The roofline model for oceanic climate applications","authors":"I. Epicoco, S. Mocavero, F. Macchia, G. Aloisio","doi":"10.1109/HPCSim.2014.6903762","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903762","url":null,"abstract":"The present work describes the analysis and optimisation of the PELAGOS025 configuration based on the coupling of the NEMO physic component of the ocean dynamics and the BFM (Biogeochemical Flux Model), a sophisticated biogeochemical model that can simulate both pelagic and benthic processes. The methodology here followed is characterised by the performance analysis of the original parallel code, in terms of strong scalability, the definition of the bottlenecks limiting the scalability when the number of processes increases, the analysis of the features of the most computational intensive kernels through the Roofline model which provides an insightful visual performance model for multicore architectures and which allows to measure and compare the performance of one or more computational kernels run on different hardware architectures.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"41 1","pages":"732-737"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86441061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fault tolerance management in distributed systems: A new leader-based consensus algorithm 分布式系统中的容错管理:一种新的基于leader的共识算法
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903691
Fouad Hanna, J. Lapayre, L. Droz-Bartholet
It is well known that consensus algorithms are fundamental building blocks for fault tolerant distributed systems. In the literature of consensus, many algorithms have been proposed to solve this problem in different system models but few attempts have been made to analyze their performance. In this paper we present a new leader-based consensus algorithm (FLC algorithm) for the crash-stop failure model. Our algorithm uses the leader oracle Ω and adapts a decentralized communication pattern. In addition, we analyze and compare the performance of our algorithm to four of the most well-known consensus algorithms among asynchronous distributed systems of the crash-stop failure model. Our results give a global idea of the performance of these algorithms and show that our algorithm gives the best performance when process crashes take place in a system using a multicast network model. At the same time, our algorithm also gives a very acceptable performance, even when crashes occur in a unicast network model and in the case where no process crashes happen within the system.
众所周知,共识算法是容错分布式系统的基本组成部分。在共识的文献中,已经提出了许多算法在不同的系统模型中解决这一问题,但很少有人尝试分析它们的性能。本文针对碰撞停止故障模型提出了一种新的基于leader的一致性算法(FLC算法)。我们的算法使用leader oracle Ω,并采用分散的通信模式。此外,我们还分析并比较了我们的算法与异步分布式系统中最著名的四种一致性算法的性能。我们的结果给出了这些算法的总体性能,并表明我们的算法在使用多播网络模型的系统中发生进程崩溃时提供了最佳性能。同时,我们的算法还提供了非常可接受的性能,即使在单播网络模型中发生崩溃以及系统中没有进程发生崩溃的情况下也是如此。
{"title":"Fault tolerance management in distributed systems: A new leader-based consensus algorithm","authors":"Fouad Hanna, J. Lapayre, L. Droz-Bartholet","doi":"10.1109/HPCSim.2014.6903691","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903691","url":null,"abstract":"It is well known that consensus algorithms are fundamental building blocks for fault tolerant distributed systems. In the literature of consensus, many algorithms have been proposed to solve this problem in different system models but few attempts have been made to analyze their performance. In this paper we present a new leader-based consensus algorithm (FLC algorithm) for the crash-stop failure model. Our algorithm uses the leader oracle Ω and adapts a decentralized communication pattern. In addition, we analyze and compare the performance of our algorithm to four of the most well-known consensus algorithms among asynchronous distributed systems of the crash-stop failure model. Our results give a global idea of the performance of these algorithms and show that our algorithm gives the best performance when process crashes take place in a system using a multicast network model. At the same time, our algorithm also gives a very acceptable performance, even when crashes occur in a unicast network model and in the case where no process crashes happen within the system.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"39 1","pages":"234-242"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81705573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Distributed scheduling and data sharing in late-binding overlays 后绑定覆盖中的分布式调度和数据共享
Pub Date : 2014-07-21 DOI: 10.1109/HPCSim.2014.6903678
A. D. Peris, J. Hernández, E. Huedo
Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.
基于拉的后期绑定覆盖在当今一些最大的计算网格中使用。作业代理被提交给资源,其职责是在运行时从中央队列检索实际工作负载。这有助于克服这些复杂环境的问题:异质性、不精确的状态信息和相对较高的故障率。此外,后期作业分配允许动态适应网格条件或用户优先级的变化。然而,随着规模的增长,中央分配队列可能成为整个系统的瓶颈。本文介绍了一种用于延迟绑定覆盖的分布式调度体系结构,它允许执行节点构建分布式散列表,并将任务匹配和分配委托给它们,从而解决了这个问题。这减少了中央服务器上的负载,使系统更具可伸缩性和健壮性。可伸缩性使细粒度调度成为可能,并支持新功能,如在执行节点上实现分布式数据缓存,这有助于缓解通常拥塞的网格存储服务。
{"title":"Distributed scheduling and data sharing in late-binding overlays","authors":"A. D. Peris, J. Hernández, E. Huedo","doi":"10.1109/HPCSim.2014.6903678","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903678","url":null,"abstract":"Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"5 1","pages":"129-136"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89706978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2014 International Conference on High Performance Computing & Simulation (HPCS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1