首页 > 最新文献

2015 International Conference on High Performance Computing & Simulation (HPCS)最新文献

英文 中文
Towards energy-efficient linear algebra with an ATLAS library tuned for energy consumption 迈向高能效线性代数与ATLAS库调谐的能源消耗
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237022
Jens Lang, G. Rünger, P. Stocker
Autotuning is an established method for adapting the execution of an application to the underlying hardware for minimising the execution time. This article investigates whether autotuning is also suitable for minimising the energy consumption of an application. The investigation is done with the linear algebra library ATLAS. Adaptations for the ATLAS package which enable energy autotuning are proposed. Different tuning parameters are investigated for whether they show a different behaviour when ATLAS is tuned for energy consumption instead for execution time. The results suggest that some tuning parameters have to be set differently when ATLAS is supposed to work with a minimum energy consumption than with a minimum execution time. The results further indicate that tuning the complete ATLAS package for energy consumption leads to a more energy-efficient execution than tuning it for execution time.
自动调优是一种既定的方法,用于使应用程序的执行适应底层硬件,从而最大限度地减少执行时间。本文研究自动调优是否也适用于最小化应用程序的能耗。研究是用线性代数库ATLAS完成的。提出了对ATLAS包的改进,使其能够实现能量自动调谐。当ATLAS针对能耗而不是执行时间进行调优时,研究不同的调优参数是否会显示不同的行为。结果表明,当ATLAS应该以最小的能量消耗和最小的执行时间工作时,必须设置一些不同的调优参数。结果进一步表明,与针对执行时间进行调优相比,针对能耗对整个ATLAS包进行调优可以获得更节能的执行。
{"title":"Towards energy-efficient linear algebra with an ATLAS library tuned for energy consumption","authors":"Jens Lang, G. Rünger, P. Stocker","doi":"10.1109/HPCSim.2015.7237022","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237022","url":null,"abstract":"Autotuning is an established method for adapting the execution of an application to the underlying hardware for minimising the execution time. This article investigates whether autotuning is also suitable for minimising the energy consumption of an application. The investigation is done with the linear algebra library ATLAS. Adaptations for the ATLAS package which enable energy autotuning are proposed. Different tuning parameters are investigated for whether they show a different behaviour when ATLAS is tuned for energy consumption instead for execution time. The results suggest that some tuning parameters have to be set differently when ATLAS is supposed to work with a minimum energy consumption than with a minimum execution time. The results further indicate that tuning the complete ATLAS package for energy consumption leads to a more energy-efficient execution than tuning it for execution time.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115931200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Many-core CPUs can deliver scalable performance to stochastic simulations of large-scale biochemical reaction networks 多核cpu可以为大规模生化反应网络的随机模拟提供可扩展的性能
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237084
Elias Kouskoumvekakis, D. Soudris, E. Manolakos
Stochastic simulation of large-scale biochemical reaction networks is becoming essential for Systems Biology. It enables the in-silico investigation of complex biological system dynamics under different conditions and intervention strategies, while also taking into account the inherent “biological noise” especially present in the low species count regime. It is however a great computational challenge since in practice we need to execute many repetitions of a complex simulation model to assess the average and extreme cases behavior of the dynamical system it represents. The problem's work scales quickly, with the number of repetitions required and the number of reactions in the bio-model. The worst case scenario s when there is a need to run thousands of repetitions of a complex model with thousands of reactions. We have developed a stochastic simulation software framework for many- and multi-core CPUs. It is evaluated using Intel's experimental many-cores Single-chip Cloud Computer (SCC) CPU and the latest generation consumer grade Core i7 multi-core Intel CPU, when running Gillespie's First Reaction Method exact stochastic simulation algorithm. It is shown that emerging many-core NoC processors can provide scalable performance achieving linear speedup as simulation work scales in both dimensions.
大规模生物化学反应网络的随机模拟对系统生物学来说是必不可少的。它能够在不同条件和干预策略下对复杂的生物系统动力学进行计算机调查,同时也考虑到固有的“生物噪声”,特别是在低物种计数制度下。然而,这是一个巨大的计算挑战,因为在实践中,我们需要对一个复杂的模拟模型进行多次重复,以评估它所代表的动力系统的平均和极端情况。随着所需的重复次数和生物模型中的反应次数的增加,该问题的工作规模迅速扩大。最糟糕的情况是需要对一个复杂的模型进行数千次重复,其中包含数千种反应。我们开发了一个多核和多核cpu的随机仿真软件框架。在运行Gillespie的第一反应方法精确随机模拟算法时,使用英特尔的实验性多核单芯片云计算机(SCC) CPU和最新一代消费级酷睿i7多核英特尔CPU进行评估。研究表明,随着仿真工作在两个维度上的扩展,新兴的多核NoC处理器可以提供可扩展的性能,实现线性加速。
{"title":"Many-core CPUs can deliver scalable performance to stochastic simulations of large-scale biochemical reaction networks","authors":"Elias Kouskoumvekakis, D. Soudris, E. Manolakos","doi":"10.1109/HPCSim.2015.7237084","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237084","url":null,"abstract":"Stochastic simulation of large-scale biochemical reaction networks is becoming essential for Systems Biology. It enables the in-silico investigation of complex biological system dynamics under different conditions and intervention strategies, while also taking into account the inherent “biological noise” especially present in the low species count regime. It is however a great computational challenge since in practice we need to execute many repetitions of a complex simulation model to assess the average and extreme cases behavior of the dynamical system it represents. The problem's work scales quickly, with the number of repetitions required and the number of reactions in the bio-model. The worst case scenario s when there is a need to run thousands of repetitions of a complex model with thousands of reactions. We have developed a stochastic simulation software framework for many- and multi-core CPUs. It is evaluated using Intel's experimental many-cores Single-chip Cloud Computer (SCC) CPU and the latest generation consumer grade Core i7 multi-core Intel CPU, when running Gillespie's First Reaction Method exact stochastic simulation algorithm. It is shown that emerging many-core NoC processors can provide scalable performance achieving linear speedup as simulation work scales in both dimensions.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134103691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Quartiles and Mel Frequency Cepstral Coefficients vectors in Hidden Markov-Gaussian Mixture Models classification of merged heart sounds and lung sounds signals 隐马尔可夫-高斯混合模型对合并心音和肺音信号分类的四分位数和Mel频率倒谱系数矢量
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237053
P. Mayorga, D. Ibarra, V. Zeljkovic, C. Druzgalski
This paper presents integrated Hidden Markov and Gaussian Mixture Models (HMM-GMM) to classify lung sounds (LS) and heart sounds (HS) characteristics. In order to optimize the models' size, several methodologies encompassing dendrograms, silhouettes and the Bayesian Information Criterion (BIC) were applied. The experiments were carried out extracting features from the LS and HS with MFCC (Mel-Frequency Cepstral Coefficients) vectors and Quantile vectors, specifically Quartiles. The merged HMM-GMM architecture for the signals using Quartiles, overall offered consistent classification results. In both types of vectors, a high degree of classification efficiency was obtained reaching up to 96% for the studied sets of signals. For MFCC the classification results were not conclusive. An assessment of the number of clusters using dendrograms, silhouettes, and BIC linked with the models' size. Consequently this allows to enhance efficiency of merged HMM-GMM models in diagnostic classification of cardiopulmonary acoustic signals.
本文提出了隐马尔可夫混合模型和高斯混合模型(HMM-GMM)对肺音和心音特征进行分类。为了优化模型的大小,采用了多种方法,包括树形图、轮廓和贝叶斯信息准则。利用MFCC (Mel-Frequency Cepstral Coefficients)向量和分位数向量(分位数为四分位数)分别对LS和HS进行特征提取。对于使用四分位的信号,合并的HMM-GMM架构总体上提供了一致的分类结果。在这两种类型的向量中,所研究的信号集的分类效率都达到了96%以上。对MFCC的分类结果尚无定论。使用树形图、轮廓和与模型大小相关的BIC来评估群集的数量。因此,这可以提高合并HMM-GMM模型在心肺声信号诊断分类中的效率。
{"title":"Quartiles and Mel Frequency Cepstral Coefficients vectors in Hidden Markov-Gaussian Mixture Models classification of merged heart sounds and lung sounds signals","authors":"P. Mayorga, D. Ibarra, V. Zeljkovic, C. Druzgalski","doi":"10.1109/HPCSim.2015.7237053","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237053","url":null,"abstract":"This paper presents integrated Hidden Markov and Gaussian Mixture Models (HMM-GMM) to classify lung sounds (LS) and heart sounds (HS) characteristics. In order to optimize the models' size, several methodologies encompassing dendrograms, silhouettes and the Bayesian Information Criterion (BIC) were applied. The experiments were carried out extracting features from the LS and HS with MFCC (Mel-Frequency Cepstral Coefficients) vectors and Quantile vectors, specifically Quartiles. The merged HMM-GMM architecture for the signals using Quartiles, overall offered consistent classification results. In both types of vectors, a high degree of classification efficiency was obtained reaching up to 96% for the studied sets of signals. For MFCC the classification results were not conclusive. An assessment of the number of clusters using dendrograms, silhouettes, and BIC linked with the models' size. Consequently this allows to enhance efficiency of merged HMM-GMM models in diagnostic classification of cardiopulmonary acoustic signals.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131619531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A lexical approach for classifying malicious URLs 用于对恶意url进行分类的词法方法
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237040
Michael Darling, G. Heileman, Gilad Gressel, Aravind Ashok, P. Poornachandran
Given the continuous growth of malicious activities on the internet, there is a need for intelligent systems to identify malicious web pages. It has been shown that URL analysis is an effective tool for detecting phishing, malware, and other attacks. Previous studies have performed URL classification using a combination of lexical features, network traffic, hosting information, and other strategies. These approaches require time-intensive lookups which introduce significant delay in real-time systems. In this paper, we describe a lightweight approach for classifying malicious web pages using URL lexical analysis alone. Our goal is to explore the upper-bound of the classification accuracy of a purely lexical approach. We also aim to develop a scalable approach which could be used in a real-time system. We develop a classification system based on lexical analysis of URLs. It correctly classifies URLs of malicious web pages with 99.1% accuracy, a 0.4% false positive rate, an F1-Score of 98.7, and 0.62 milliseconds on average. Our method also outperforms similar approaches when classifying out-of-sample data.
鉴于互联网上恶意活动的持续增长,需要智能系统来识别恶意网页。事实证明,URL分析是检测网络钓鱼、恶意软件和其他攻击的有效工具。以前的研究使用词法特征、网络流量、托管信息和其他策略的组合来执行URL分类。这些方法需要大量的时间查找,这会给实时系统带来很大的延迟。在本文中,我们描述了一种仅使用URL词法分析对恶意网页进行分类的轻量级方法。我们的目标是探索纯词法方法的分类精度的上限。我们还致力于开发一种可扩展的方法,可用于实时系统。我们开发了一个基于词法分析的url分类系统。它对恶意网页url的正确分类准确率为99.1%,误报率为0.4%,F1-Score为98.7,平均0.62毫秒。在分类样本外数据时,我们的方法也优于类似的方法。
{"title":"A lexical approach for classifying malicious URLs","authors":"Michael Darling, G. Heileman, Gilad Gressel, Aravind Ashok, P. Poornachandran","doi":"10.1109/HPCSim.2015.7237040","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237040","url":null,"abstract":"Given the continuous growth of malicious activities on the internet, there is a need for intelligent systems to identify malicious web pages. It has been shown that URL analysis is an effective tool for detecting phishing, malware, and other attacks. Previous studies have performed URL classification using a combination of lexical features, network traffic, hosting information, and other strategies. These approaches require time-intensive lookups which introduce significant delay in real-time systems. In this paper, we describe a lightweight approach for classifying malicious web pages using URL lexical analysis alone. Our goal is to explore the upper-bound of the classification accuracy of a purely lexical approach. We also aim to develop a scalable approach which could be used in a real-time system. We develop a classification system based on lexical analysis of URLs. It correctly classifies URLs of malicious web pages with 99.1% accuracy, a 0.4% false positive rate, an F1-Score of 98.7, and 0.62 milliseconds on average. Our method also outperforms similar approaches when classifying out-of-sample data.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115188501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Explaining disease using big data: How valid is your pathway? 用大数据解释疾病:你的途径有多有效?
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237114
Bas Stringer, Maurits J. J. Dijkstra, K. Feenstra, Sanne Abeln, J. Heringa
The design of solutions to current societal challenges in human health, healthcare and nutrition, and to the sustainable production of food, feed and energy, requires academic innovations and industrial activity based on life science R&D in its broadest sense. The diversity of on-going programs shows that public-private collaboration is increasing in each of these sectors. A few examples in The Netherlands alone include the Dutch Techcenter for Life Sciences (DTL), CTMM-TraIT (TransMart, Open Clinica), NFU Data 4 Lifesciences initiative, Onco-XL, Parelsnoer, Centre for Personalized Cancer Treatment (CPCT) and Philips' Health-Suite Digital Platform in the Life Science & Health sector; Breed4Food and TIFN in Agri&Food; Virtual Lab for Plant Breeding, Seed Valley and “Tuinbouw Digitaal” in Horticulture; and BeBasic in Biobased Economy.style the text.
设计解决当前人类健康、医疗保健和营养方面的社会挑战,以及食品、饲料和能源的可持续生产,需要基于最广泛意义上的生命科学研发的学术创新和工业活动。正在进行的项目的多样性表明,这些部门的公私合作正在增加。仅在荷兰就有几个例子,包括荷兰生命科学技术中心(DTL)、CTMM-TraIT (TransMart、Open Clinica)、NFU Data 4生命科学倡议、Onco-XL、Parelsnoer、个性化癌症治疗中心(CPCT)和飞利浦在生命科学与健康领域的Health- suite数字平台;食品育种与农业食品中的TIFN植物育种虚拟实验室、种子谷与园艺“双谷数字”以及生物经济领域的BeBasic。设置文本样式。
{"title":"Explaining disease using big data: How valid is your pathway?","authors":"Bas Stringer, Maurits J. J. Dijkstra, K. Feenstra, Sanne Abeln, J. Heringa","doi":"10.1109/HPCSim.2015.7237114","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237114","url":null,"abstract":"The design of solutions to current societal challenges in human health, healthcare and nutrition, and to the sustainable production of food, feed and energy, requires academic innovations and industrial activity based on life science R&D in its broadest sense. The diversity of on-going programs shows that public-private collaboration is increasing in each of these sectors. A few examples in The Netherlands alone include the Dutch Techcenter for Life Sciences (DTL), CTMM-TraIT (TransMart, Open Clinica), NFU Data 4 Lifesciences initiative, Onco-XL, Parelsnoer, Centre for Personalized Cancer Treatment (CPCT) and Philips' Health-Suite Digital Platform in the Life Science & Health sector; Breed4Food and TIFN in Agri&Food; Virtual Lab for Plant Breeding, Seed Valley and “Tuinbouw Digitaal” in Horticulture; and BeBasic in Biobased Economy.style the text.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115989057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Straightforward modeling of fully-connected dragonfly topologies in HPC-system simulators hpc系统模拟器中全连接蜻蜓拓扑的直接建模
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237037
P. Yébenes, P. García, F. Quiles, J. Escudero-Sahuquillo
HPC systems are growing in number of components which have to be interconnected in an efficient way. For that reason, network design has become a key issue in the development of these systems, especially when they are made of thousands of elements. In order to maximize the performance achieved by the network with an affordable cost, new network topologies have been proposed in the last years. Among them, one of the most popular is the dragonfly topology which benefits from high radix switches. As it is not affordable to test these topologies in large real systems, simulation is widely used. In that sense, simulation frameworks are used for avoiding problems and costs derived from developing a simulator from scratch, as well as easing the design of new models. In that sense, OMNeT++ is one of the most prominent simulation frameworks, deeply accepted in modeling large networks. This paper focuses on the modeling of fully-connected dragonfly topologies and its implementation in generic HPC-system simulators. First, we explain in detail the modeling of the dragonfly interconnection pattern. Next, we also describe the modeling of the minimal-path routing algorithm which fits the proposed pattern, as well as the mechanism required for avoiding deadlocks. Besides, we describe the basics of the implementation of the proposed model in an OMNeT++-based simulator. Finally, by means of a set of experiments carried out under several dragonfly configurations, we show performance results obtained from the simulator that implements our dragonfly model, and we compare them with results shown in other papers for validation purposes. Although this evaluation has been made using an OMNeT++-based simulator, the modeled interconnection pattern and routing algorithm can be adapted to any simulation tool.
高性能计算系统的组件数量不断增加,这些组件必须以一种有效的方式相互连接。因此,网络设计已成为这些系统开发中的一个关键问题,特别是当它们由数千个元素组成时。为了以可承受的成本实现网络性能的最大化,近年来提出了新的网络拓扑。其中,最受欢迎的是蜻蜓拓扑,它得益于高基数开关。由于无法在大型真实系统中测试这些拓扑,因此仿真被广泛使用。从这个意义上说,模拟框架用于避免从头开发模拟器产生的问题和成本,以及简化新模型的设计。从这个意义上说,omnet++是最突出的仿真框架之一,在大型网络建模中被广泛接受。本文主要研究全连接蜻蜓拓扑的建模及其在通用hpc系统模拟器中的实现。首先,我们详细解释了蜻蜓互连模式的建模。接下来,我们还描述了适合所提出模式的最小路径路由算法的建模,以及避免死锁所需的机制。此外,我们还描述了在基于omnet++的模拟器中实现所提出模型的基本原理。最后,通过在几种蜻蜓配置下进行的一组实验,我们展示了从实现我们的蜻蜓模型的模拟器获得的性能结果,并将其与其他论文的结果进行了比较,以进行验证。虽然这种评估是使用基于omnet++的模拟器进行的,但建模的互连模式和路由算法可以适应任何仿真工具。
{"title":"Straightforward modeling of fully-connected dragonfly topologies in HPC-system simulators","authors":"P. Yébenes, P. García, F. Quiles, J. Escudero-Sahuquillo","doi":"10.1109/HPCSim.2015.7237037","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237037","url":null,"abstract":"HPC systems are growing in number of components which have to be interconnected in an efficient way. For that reason, network design has become a key issue in the development of these systems, especially when they are made of thousands of elements. In order to maximize the performance achieved by the network with an affordable cost, new network topologies have been proposed in the last years. Among them, one of the most popular is the dragonfly topology which benefits from high radix switches. As it is not affordable to test these topologies in large real systems, simulation is widely used. In that sense, simulation frameworks are used for avoiding problems and costs derived from developing a simulator from scratch, as well as easing the design of new models. In that sense, OMNeT++ is one of the most prominent simulation frameworks, deeply accepted in modeling large networks. This paper focuses on the modeling of fully-connected dragonfly topologies and its implementation in generic HPC-system simulators. First, we explain in detail the modeling of the dragonfly interconnection pattern. Next, we also describe the modeling of the minimal-path routing algorithm which fits the proposed pattern, as well as the mechanism required for avoiding deadlocks. Besides, we describe the basics of the implementation of the proposed model in an OMNeT++-based simulator. Finally, by means of a set of experiments carried out under several dragonfly configurations, we show performance results obtained from the simulator that implements our dragonfly model, and we compare them with results shown in other papers for validation purposes. Although this evaluation has been made using an OMNeT++-based simulator, the modeled interconnection pattern and routing algorithm can be adapted to any simulation tool.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121964657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurately modeling the GPU memory subsystem 准确建模GPU内存子系统
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237038
F. Candel, S. Petit, J. Sahuquillo, J. Duato
Nowadays, research on GPU processor architecture is extraordinarily active since these architectures offer much more performance per watt than CPU architectures. This is the main reason why massive deployment of GPU multiprocessors is considered one of the most feasible solutions to attain exascale computing capabilities. In this context, ongoing GPU architecture research is required to improve GPU programmability as well as to integrate CPU and GPU cores in the same die. One of the most important research topics in current GPUs, is the GPU memory hierarchy, since its design goals are very different from those of conventional CPU memory hierarchies. To explore novel designs to better support General Purpose computing in GPUs (GPGPU computing) as well as to improve the performance of GPU and CPU/GPU systems, researchers often require advanced microarchitectural simulators with detailed models of the memory subsystem. Nevertheless, due to fast speed at which current GPU architectures evolve, simulation accuracy of existing state-of-the-art simulators suffers. This paper focuses on accurately modeling the GPU memory subsystem. We identified three main aspects that should be modeled with more accuracy: i) miss status holding registers, ii) coalescing vector memory requests, and iii) non-blocking GPU stores. In this sense, we extend the Multi2Sim heterogeneous CPU/GPU processor simulator to model these aspects with enough accuracy. Experimental results show that if these aspects are not considered in the simulation framework, performance deviations can rise in some applications up to 70%, 75%, and 60%, respectively.
目前,对GPU处理器架构的研究非常活跃,因为这些架构提供了比CPU架构更高的每瓦特性能。这就是为什么大规模部署GPU多处理器被认为是实现百亿亿次计算能力最可行的解决方案之一的主要原因。在这种情况下,需要对GPU架构进行持续的研究,以提高GPU的可编程性,并将CPU和GPU内核集成到同一个芯片中。GPU内存层次结构是当前GPU中最重要的研究课题之一,因为它的设计目标与传统的CPU内存层次结构有很大的不同。为了探索新的设计,以更好地支持GPU中的通用计算(GPGPU计算),并提高GPU和CPU/GPU系统的性能,研究人员经常需要先进的微架构模拟器,其中包含内存子系统的详细模型。然而,由于当前GPU架构的快速发展,现有最先进的模拟器的仿真精度受到影响。本文重点研究了GPU内存子系统的精确建模。我们确定了应该更准确地建模的三个主要方面:i)错过状态保持寄存器,ii)合并向量内存请求,以及iii)非阻塞GPU存储。在这个意义上,我们扩展了Multi2Sim异构CPU/GPU处理器模拟器,以足够的精度对这些方面进行建模。实验结果表明,如果在仿真框架中不考虑这些方面,在某些应用中性能偏差可能分别上升高达70%,75%和60%。
{"title":"Accurately modeling the GPU memory subsystem","authors":"F. Candel, S. Petit, J. Sahuquillo, J. Duato","doi":"10.1109/HPCSim.2015.7237038","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237038","url":null,"abstract":"Nowadays, research on GPU processor architecture is extraordinarily active since these architectures offer much more performance per watt than CPU architectures. This is the main reason why massive deployment of GPU multiprocessors is considered one of the most feasible solutions to attain exascale computing capabilities. In this context, ongoing GPU architecture research is required to improve GPU programmability as well as to integrate CPU and GPU cores in the same die. One of the most important research topics in current GPUs, is the GPU memory hierarchy, since its design goals are very different from those of conventional CPU memory hierarchies. To explore novel designs to better support General Purpose computing in GPUs (GPGPU computing) as well as to improve the performance of GPU and CPU/GPU systems, researchers often require advanced microarchitectural simulators with detailed models of the memory subsystem. Nevertheless, due to fast speed at which current GPU architectures evolve, simulation accuracy of existing state-of-the-art simulators suffers. This paper focuses on accurately modeling the GPU memory subsystem. We identified three main aspects that should be modeled with more accuracy: i) miss status holding registers, ii) coalescing vector memory requests, and iii) non-blocking GPU stores. In this sense, we extend the Multi2Sim heterogeneous CPU/GPU processor simulator to model these aspects with enough accuracy. Experimental results show that if these aspects are not considered in the simulation framework, performance deviations can rise in some applications up to 70%, 75%, and 60%, respectively.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124047728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
MS3: A Mediterranean-stile job scheduler for supercomputers - do less when it's too hot! 地中海风格的超级计算机作业调度器——天气太热时少做点事!
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237025
Andrea Borghesi, C. Conficoni, M. Lombardi, Andrea Bartolini
Supercomputers machines, HPC systems in general, embed sophisticated and advanced cooling circuits to remove heat and ensuring the required peak performance. Unfortunately removing heat, by means of cold water or air, costs additional power which decreases the overall supercomputer energy efficiency. Free-cooling uses ambient air instead than chiller to cool down warm air or liquid temperature. The amount of heat which can be removed for-free depends on ambient conditions such as temperature and humidity. Power capping can be used to reduce the supercomputer power dissipation to maximize the cooling efficiency. In this paper we present a power capping approach based on Constraint Programming which enables to estimate at every scheduling interval the power consumption of a given job schedule and to select among all possible job schedules the one which maximizes the supercomputer efficiency.
一般来说,超级计算机和高性能计算系统都嵌入了复杂而先进的冷却电路,以消除热量并确保所需的峰值性能。不幸的是,通过冷水或冷空气来散热会消耗额外的电力,从而降低超级计算机的整体能效。自然冷却使用环境空气而不是冷却器来冷却热空气或液体温度。可以免费去除的热量取决于环境条件,如温度和湿度。功率封顶可以降低超级计算机的功耗,最大限度地提高散热效率。本文提出了一种基于约束规划的功率封顶方法,该方法能够估计给定作业调度在每个调度区间的功耗,并在所有可能的作业调度中选择最优的作业调度。
{"title":"MS3: A Mediterranean-stile job scheduler for supercomputers - do less when it's too hot!","authors":"Andrea Borghesi, C. Conficoni, M. Lombardi, Andrea Bartolini","doi":"10.1109/HPCSim.2015.7237025","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237025","url":null,"abstract":"Supercomputers machines, HPC systems in general, embed sophisticated and advanced cooling circuits to remove heat and ensuring the required peak performance. Unfortunately removing heat, by means of cold water or air, costs additional power which decreases the overall supercomputer energy efficiency. Free-cooling uses ambient air instead than chiller to cool down warm air or liquid temperature. The amount of heat which can be removed for-free depends on ambient conditions such as temperature and humidity. Power capping can be used to reduce the supercomputer power dissipation to maximize the cooling efficiency. In this paper we present a power capping approach based on Constraint Programming which enables to estimate at every scheduling interval the power consumption of a given job schedule and to select among all possible job schedules the one which maximizes the supercomputer efficiency.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124775036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Quantum computing: How far away is it? 量子计算:还有多远?
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237090
K. Bertels
Moore's law is pushing the technology to the scale where quantum phenonema, such as quantum tunneling, can no longer be ignored. Where in conventional CMOS one tries to avoid unwanted quantum behaviour, quantum computing actually embraces these phenomena for computational purposes. The famous physicist Richard Feyman was the first to describe the idea of using superposition and entanglement as a way to model and simulate quantum phenomena.
摩尔定律正在推动这项技术的发展,使量子现象,如量子隧道效应,不再被忽视。在传统CMOS中,人们试图避免不必要的量子行为,而量子计算实际上为了计算目的而包含了这些现象。著名物理学家理查德·费曼(Richard Feyman)是第一个描述使用叠加和纠缠作为建模和模拟量子现象的方法的人。
{"title":"Quantum computing: How far away is it?","authors":"K. Bertels","doi":"10.1109/HPCSim.2015.7237090","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237090","url":null,"abstract":"Moore's law is pushing the technology to the scale where quantum phenonema, such as quantum tunneling, can no longer be ignored. Where in conventional CMOS one tries to avoid unwanted quantum behaviour, quantum computing actually embraces these phenomena for computational purposes. The famous physicist Richard Feyman was the first to describe the idea of using superposition and entanglement as a way to model and simulate quantum phenomena.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128627642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On-demand reconfiguration for coprocessors in mixed criticality multicore systems 混合临界多核系统中协处理器的按需重构
Pub Date : 2015-07-20 DOI: 10.1109/HPCSim.2015.7237094
Viet Vu Duy, O. Sander, T. Sandmann, Jan Heidelberger, S. Bähr, J. Becker
Especially in complex system-of-systems scenarios, where multiple high-performance or real-time processing functions need to co-exist and interact, reconfigurable devices together with virtualization techniques show considerable promise to increase efficiency, ease integration and maintain functional and non-functional properties of the individual functions. In a previous work, we proposed a concept that leverages the advantages of FPGA's partial reconfiguration in heterogeneous mixed criticality multicore systems. The basic idea how to handle the partial reconfiguration transparently for noncritical tasks, while providing full control and a predictable behavior for safety relevant tasks was described. In this paper, we focus on the on-demand partial reconfiguration of coprocessors and its implementation details. Our prototype is implemented on an Intel multicore system and a Xilinx Virtex-7 FPGA connected via PCI Express, taking advantage of the Single-Root I/O Virtualization capabilities in modern PCI Express implementations. Experimental results show that compared to the reference software implementation, our concept achieves significantly shorter reconfiguration time with lower variance under various system load situations.
特别是在复杂的系统的场景中,多个高性能或实时处理功能需要共存和交互,可重构设备与虚拟化技术一起显示出相当大的希望,以提高效率,简化集成和维护单个功能的功能和非功能属性。在之前的工作中,我们提出了一个概念,利用FPGA在异构混合临界多核系统中的部分重构优势。描述了如何透明地处理非关键任务的部分重构,同时为安全相关任务提供完全控制和可预测行为的基本思想。本文重点讨论了协处理器的按需部分重构及其实现细节。我们的原型是在Intel多核系统和Xilinx Virtex-7 FPGA上实现的,通过PCI Express连接,利用了现代PCI Express实现中的单根I/O虚拟化功能。实验结果表明,与参考软件实现相比,我们的概念在各种系统负载情况下实现了更短的重构时间和更小的方差。
{"title":"On-demand reconfiguration for coprocessors in mixed criticality multicore systems","authors":"Viet Vu Duy, O. Sander, T. Sandmann, Jan Heidelberger, S. Bähr, J. Becker","doi":"10.1109/HPCSim.2015.7237094","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237094","url":null,"abstract":"Especially in complex system-of-systems scenarios, where multiple high-performance or real-time processing functions need to co-exist and interact, reconfigurable devices together with virtualization techniques show considerable promise to increase efficiency, ease integration and maintain functional and non-functional properties of the individual functions. In a previous work, we proposed a concept that leverages the advantages of FPGA's partial reconfiguration in heterogeneous mixed criticality multicore systems. The basic idea how to handle the partial reconfiguration transparently for noncritical tasks, while providing full control and a predictable behavior for safety relevant tasks was described. In this paper, we focus on the on-demand partial reconfiguration of coprocessors and its implementation details. Our prototype is implemented on an Intel multicore system and a Xilinx Virtex-7 FPGA connected via PCI Express, taking advantage of the Single-Root I/O Virtualization capabilities in modern PCI Express implementations. Experimental results show that compared to the reference software implementation, our concept achieves significantly shorter reconfiguration time with lower variance under various system load situations.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127422386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2015 International Conference on High Performance Computing & Simulation (HPCS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1