首页 > 最新文献

Supercomput. Front. Innov.最新文献

英文 中文
Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT 收集和呈现可复制的内部网模板性能:检查
Pub Date : 2019-06-19 DOI: 10.14529/JSFI190301
Julian Hornich, Julian Hammer, G. Hager, T. Gruber, G. Wellein
Stencil algorithms have been receiving considerable interest in HPC research for decades. The techniques used to approach multi-core stencil performance modeling and engineering span basic runtime measurements, elaborate performance models, detailed hardware counter analysis, and thorough scaling behavior evaluation. Due to the plurality of approaches and stencil patterns, we set out to develop a generalizable methodology for reproducible measurements accompanied by state-of-the-art performance models. Our open-source toolchain, and collected results are publicly available in the "Intranode Stencil Performance Evaluation Collection" (INSPECT). We present the underlying methodologies, models and tools involved in gathering and documenting the performance behavior of a collection of typical stencil patterns across multiple architectures and hardware configuration options. Our aim is to endow performance-aware application developers with reproducible baseline performance data and validated models to initiate a well-defined process of performance assessment and optimization.
几十年来,模板算法一直受到高性能计算研究的极大关注。用于处理多核模板性能建模和工程的技术包括基本的运行时测量、详细的性能模型、详细的硬件计数器分析和全面的缩放行为评估。由于有多种方法和模板模式,我们着手开发一种通用的方法,用于伴随最先进性能模型的可重复测量。我们的开源工具链和收集的结果在“内部网模板性能评估集合”(INSPECT)中公开可用。我们介绍了收集和记录跨多个体系结构和硬件配置选项的典型模板模式集合的性能行为所涉及的基本方法、模型和工具。我们的目标是为性能敏感的应用程序开发人员提供可重复的基准性能数据和经过验证的模型,以启动一个定义良好的性能评估和优化过程。
{"title":"Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT","authors":"Julian Hornich, Julian Hammer, G. Hager, T. Gruber, G. Wellein","doi":"10.14529/JSFI190301","DOIUrl":"https://doi.org/10.14529/JSFI190301","url":null,"abstract":"Stencil algorithms have been receiving considerable interest in HPC research for decades. The techniques used to approach multi-core stencil performance modeling and engineering span basic runtime measurements, elaborate performance models, detailed hardware counter analysis, and thorough scaling behavior evaluation. Due to the plurality of approaches and stencil patterns, we set out to develop a generalizable methodology for reproducible measurements accompanied by state-of-the-art performance models. Our open-source toolchain, and collected results are publicly available in the \"Intranode Stencil Performance Evaluation Collection\" (INSPECT). We present the underlying methodologies, models and tools involved in gathering and documenting the performance behavior of a collection of typical stencil patterns across multiple architectures and hardware configuration options. Our aim is to endow performance-aware application developers with reproducible baseline performance data and validated models to initiate a well-defined process of performance assessment and optimization.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127389293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications 文件访问模式如何影响集群应用程序之间的I/O干扰程度
Pub Date : 2019-06-11 DOI: 10.14529/JSFI190203
A. Shah, C. Kuo, Akihiro Nomura, S. Matsuoka, F. Wolf
On large-scale clusters, tens to hundreds of applications can simultaneously access a parallel file system, leading to contention and, in its wake, to degraded application performance. In this article, we analyze the influence of file-access patterns on the degree of interference. As it is by experience most intrusive, we focus our attention on write-write contention. We observe considerable differences among the interference potentials of several typical write patterns. In particular, we found that if one parallel program writes large output files while another one writes small checkpointing files, then the latter is slowed down when the checkpointing files are small enough and the former is vice versa. Moreover, applications with a few processes writing large output files already can significantly hinder applications with many processes from checkpointing small files. Such effects can seriously impact the runtime of real applications—up to a factor of five in one instance. Our insights and measurement techniques offer an opportunity to automatically classify the interference potential between applications and to adjust scheduling decisions accordingly.
在大规模集群上,数十到数百个应用程序可以同时访问一个并行文件系统,这会导致争用,进而降低应用程序性能。在本文中,我们分析了文件访问模式对干扰程度的影响。根据经验,它是最具侵入性的,因此我们将注意力集中在写-写争用上。我们观察到几种典型写入模式的干扰电位之间存在相当大的差异。特别是,我们发现,如果一个并行程序写大的输出文件,而另一个程序写小的检查点文件,那么当检查点文件足够小时,后者的速度会减慢,而前者反之亦然。此外,使用少数进程编写大型输出文件的应用程序已经严重阻碍了使用许多进程检查小文件的应用程序。这样的影响可能会严重影响实际应用程序的运行时,在一个实例中影响可达5倍。我们的见解和测量技术提供了一个机会,可以自动对应用程序之间的潜在干扰进行分类,并相应地调整调度决策。
{"title":"How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications","authors":"A. Shah, C. Kuo, Akihiro Nomura, S. Matsuoka, F. Wolf","doi":"10.14529/JSFI190203","DOIUrl":"https://doi.org/10.14529/JSFI190203","url":null,"abstract":"On large-scale clusters, tens to hundreds of applications can simultaneously access a parallel file system, leading to contention and, in its wake, to degraded application performance. In this article, we analyze the influence of file-access patterns on the degree of interference. As it is by experience most intrusive, we focus our attention on write-write contention. We observe considerable differences among the interference potentials of several typical write patterns. In particular, we found that if one parallel program writes large output files while another one writes small checkpointing files, then the latter is slowed down when the checkpointing files are small enough and the former is vice versa. Moreover, applications with a few processes writing large output files already can significantly hinder applications with many processes from checkpointing small files. Such effects can seriously impact the runtime of real applications—up to a factor of five in one instance. Our insights and measurement techniques offer an opportunity to automatically classify the interference potential between applications and to adjust scheduling decisions accordingly.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122216373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
HPC Processors Benchmarking Assessment for Global System Science Applications 面向全球系统科学应用的高性能计算处理器基准评估
Pub Date : 2019-06-04 DOI: 10.14529/JSFI190202
D. Kaliszan, N. Meyer, S. Petruczynik, M. Gienger, Sergiy Gogolenko
The work undertaken in this paper was done in the Centre of Excellence for Global Systems Science (CoeGSS) – an interdisciplinary project funded by the European Commission. CoeGSS project provides a computer-aided decision support in the face of global challenges (e.g. development of energy, water and food supply systems, urbanisation processes and growth of the cities, pandemic control, etc.) and tries to bring together HPC and global systems science. This paper presents a proposition of GSS benchmark which evaluates HPC architectures with respect to GSS applications and seeks for the best HPC system for typical GSS software environments. The outcome of the analysis is defining a benchmark which represents the average GSS environment and its challenges in a good way: spread of smoking habits and development of tobacco industry, development of green cars market and global urbanisation processes. Results of the tests that have been run on a number of recently appeared HPC platforms allow comparing processors’ architectures with respect to different applications using execution times, TDPs3 and TCOs4 as the basic metrics for ranking HPC architectures. Finally, we believe that our analysis of the results conveys a valuable information to the broadened GSS audience which might help to determine the hardware demands for their specific applications, as well as to the HPC community which requires a mature benchmark set reflecting requirements and traits of the GSS applications. Our work can be considered as a step into direction of development of such mature benchmark.
本文中的工作是在全球系统科学卓越中心(CoeGSS)完成的,这是一个由欧盟委员会资助的跨学科项目。CoeGSS项目在面对全球挑战(例如能源、水和食品供应系统的发展、城市化进程和城市增长、流行病控制等)时提供计算机辅助决策支持,并试图将高性能计算和全球系统科学结合起来。本文提出了一种基于GSS应用的高性能计算系统(HPC)基准测试方法,该方法评估了GSS应用中的高性能计算系统架构,并为典型的GSS软件环境寻找最佳的高性能计算系统。分析的结果是定义一个基准,它代表了平均的GSS环境和它的挑战在一个好的方式:吸烟习惯的传播和烟草业的发展,绿色汽车市场的发展和全球城市化进程。在许多最近出现的HPC平台上运行的测试结果允许使用执行时间、TDPs3和TCOs4作为HPC架构排名的基本指标,比较不同应用程序的处理器架构。最后,我们相信我们对结果的分析为广大GSS受众提供了有价值的信息,这可能有助于确定其特定应用程序的硬件需求,也有助于HPC社区需要一个反映GSS应用程序需求和特征的成熟基准集。我们的工作可以看作是朝着这样一个成熟标杆的发展方向迈出的一步。
{"title":"HPC Processors Benchmarking Assessment for Global System Science Applications","authors":"D. Kaliszan, N. Meyer, S. Petruczynik, M. Gienger, Sergiy Gogolenko","doi":"10.14529/JSFI190202","DOIUrl":"https://doi.org/10.14529/JSFI190202","url":null,"abstract":"The work undertaken in this paper was done in the Centre of Excellence for Global Systems Science (CoeGSS) – an interdisciplinary project funded by the European Commission. CoeGSS project provides a computer-aided decision support in the face of global challenges (e.g. development of energy, water and food supply systems, urbanisation processes and growth of the cities, pandemic control, etc.) and tries to bring together HPC and global systems science. This paper presents a proposition of GSS benchmark which evaluates HPC architectures with respect to GSS applications and seeks for the best HPC system for typical GSS software environments. The outcome of the analysis is defining a benchmark which represents the average GSS environment and its challenges in a good way: spread of smoking habits and development of tobacco industry, development of green cars market and global urbanisation processes. Results of the tests that have been run on a number of recently appeared HPC platforms allow comparing processors’ architectures with respect to different applications using execution times, TDPs3 and TCOs4 as the basic metrics for ranking HPC architectures. Finally, we believe that our analysis of the results conveys a valuable information to the broadened GSS audience which might help to determine the hardware demands for their specific applications, as well as to the HPC community which requires a mature benchmark set reflecting requirements and traits of the GSS applications. Our work can be considered as a step into direction of development of such mature benchmark.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127499916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Development of a RISC-V-Conform Fused Multiply-Add Floating-Point Unit 符合risc - v标准的融合乘加浮点单元的开发
Pub Date : 2019-06-01 DOI: 10.14529/JSFI190205
Felix Kaiser, Stefan Kosnac, U. Brüning
Despite the fact that the open-source community around the RISC-V instruction set architecture is growing rapidly, there is still no high-speed open-source hardware implementation of the IEEE 754-2008 floating-point standard available. We designed a Fused Multiply-Add Floating-Point Unit compatible with the RISC-V ISA in SystemVerilog, which enables us to conduct detailed optimizations where necessary. The design has been verified with the industry standard simulation-based Universal Verification Methodology using the Specman e Hardware Verification Language. The most challenging part of the verification is the reference model, for which we integrated the Floating-Point Unit of an existing Intel processor using the Function Level Interface provided by Specman e. With the use of Intel's Floating-Point Unit we have a ``known good" and fast reference model. The Back-End flow was done with Global Foundries' 22 nm Fully-Depleted Silicon-On-Insulator (GF22FDX) process using Cadence tools. We reached 1.8 GHz over PVT corners with a 0.8 V forward body bias, but there is still a large potential for further RTL optimization. A power analysis was conducted with stimuli generated by the verification environment and resulted in 212 mW.
尽管围绕RISC-V指令集架构的开源社区正在迅速发展,但仍然没有IEEE 754-2008浮点标准的高速开源硬件实现。我们设计了一个与SystemVerilog中的RISC-V ISA兼容的融合乘加浮点单元,这使我们能够在必要时进行详细的优化。该设计已通过使用Specman硬件验证语言的基于行业标准仿真的通用验证方法进行了验证。验证中最具挑战性的部分是参考模型,为此我们使用Specman e提供的功能级接口集成了现有英特尔处理器的浮点单元。使用英特尔的浮点单元,我们有一个“已知的好”和快速的参考模型。后端流程使用Cadence工具,采用Global Foundries的22纳米全耗尽绝缘体上硅(GF22FDX)工艺完成。我们在PVT弯道上达到了1.8 GHz,车身前偏置为0.8 V,但进一步的RTL优化仍有很大的潜力。在验证环境产生的刺激下进行了功率分析,结果为212 mW。
{"title":"Development of a RISC-V-Conform Fused Multiply-Add Floating-Point Unit","authors":"Felix Kaiser, Stefan Kosnac, U. Brüning","doi":"10.14529/JSFI190205","DOIUrl":"https://doi.org/10.14529/JSFI190205","url":null,"abstract":"Despite the fact that the open-source community around the RISC-V instruction set architecture is growing rapidly, there is still no high-speed open-source hardware implementation of the IEEE 754-2008 floating-point standard available. We designed a Fused Multiply-Add Floating-Point Unit compatible with the RISC-V ISA in SystemVerilog, which enables us to conduct detailed optimizations where necessary. The design has been verified with the industry standard simulation-based Universal Verification Methodology using the Specman e Hardware Verification Language. The most challenging part of the verification is the reference model, for which we integrated the Floating-Point Unit of an existing Intel processor using the Function Level Interface provided by Specman e. With the use of Intel's Floating-Point Unit we have a ``known good\" and fast reference model. The Back-End flow was done with Global Foundries' 22 nm Fully-Depleted Silicon-On-Insulator (GF22FDX) process using Cadence tools. We reached 1.8 GHz over PVT corners with a 0.8 V forward body bias, but there is still a large potential for further RTL optimization. A power analysis was conducted with stimuli generated by the verification environment and resulted in 212 mW.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124686596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Comparative Analysis of Virtualization Methods in Big Data Processing 大数据处理中的虚拟化方法比较分析
Pub Date : 2019-04-10 DOI: 10.14529/JSFI190107
G. Radchenko, Ameer B. A. Alaasam, Andrei Tchernykh
Cloud computing systems have become widely used for Big Data processing, providing access to a wide variety of computing resources and a greater distribution between multi-clouds. This trend has been strengthened by the rapid development of the Internet of Things (IoT) concept. Virtualization via virtual machines and containers is a traditional way of organization of cloud computing infrastructure. Containerization technology provides a lightweight virtual runtime environment. In addition to the advantages of traditional virtual machines in terms of size and flexibility, containers are particularly important for integration tasks for PaaS solutions, such as application packaging and service orchestration. In this paper, we overview the current state-of-the-art of virtualization and containerization approaches and technologies in the context of Big Data tasks solution. We present the results of studies which compare the efficiency of containerization and virtualization technologies to solve Big Data problems. We also analyze containerized and virtualized services collaboration solutions to support automation of the deployment and execution of Big Data applications in the cloud infrastructure.
云计算系统已被广泛用于大数据处理,提供对各种计算资源的访问,并在多云之间进行更大的分布。物联网(IoT)概念的快速发展加强了这一趋势。通过虚拟机和容器实现虚拟化是组织云计算基础设施的一种传统方式。容器化技术提供了轻量级的虚拟运行时环境。除了传统虚拟机在大小和灵活性方面的优势之外,容器对于PaaS解决方案的集成任务尤其重要,例如应用程序打包和服务编排。在本文中,我们概述了当前大数据任务解决方案中虚拟化和容器化方法和技术的最新进展。我们提出的研究结果比较了集装箱化和虚拟化技术解决大数据问题的效率。我们还分析了容器化和虚拟化服务协作解决方案,以支持云基础设施中大数据应用程序的自动化部署和执行。
{"title":"Comparative Analysis of Virtualization Methods in Big Data Processing","authors":"G. Radchenko, Ameer B. A. Alaasam, Andrei Tchernykh","doi":"10.14529/JSFI190107","DOIUrl":"https://doi.org/10.14529/JSFI190107","url":null,"abstract":"Cloud computing systems have become widely used for Big Data processing, providing access to a wide variety of computing resources and a greater distribution between multi-clouds. This trend has been strengthened by the rapid development of the Internet of Things (IoT) concept. Virtualization via virtual machines and containers is a traditional way of organization of cloud computing infrastructure. Containerization technology provides a lightweight virtual runtime environment. In addition to the advantages of traditional virtual machines in terms of size and flexibility, containers are particularly important for integration tasks for PaaS solutions, such as application packaging and service orchestration. In this paper, we overview the current state-of-the-art of virtualization and containerization approaches and technologies in the context of Big Data tasks solution. We present the results of studies which compare the efficiency of containerization and virtualization technologies to solve Big Data problems. We also analyze containerized and virtualized services collaboration solutions to support automation of the deployment and execution of Big Data applications in the cloud infrastructure.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116467383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Efficient Parallel Implementation of Multi-Arrival 3D Prestack Seismic Depth Migration 多点三维叠前地震深度偏移的高效并行实现
Pub Date : 2019-03-11 DOI: 10.14529/JSFI190101
A. Pleshkevich, A. Ivanov, V. Levchenko, S. Khilkov, B. P. Moroz
The goal of seismic migration is to reconstruct the image of Earth's depth inhomogeneities on the base of seismic data. Seismic data is obtained using shots in shallow wells that are located in a dense grid points. Those shots could be considered as special point sources. A reflected and scattered seismic waves from the depth inhomogeneities are received by geophones located also in a dense grid points on a surface. A seismic image of depth inhomogeneities can be constructed based on these waves. The implementation of 3-D seismic migration implies the solution of about 10 4÷5 3-D direct problems of wave propagation. Hence efficient asymptotic methods are of a great practical importance. The multi-arrival 3-D seismic migration program is implemented based on a new asymptotic method. It takes into account multi-pass wave propagation and caustics. The program uses parallel calculations in an MPI environment on hundreds and thousands of processor cores. The program was successfully tested on an international synthetic "SEG salt" data set and on real data. A seismic image cube for Timan-Pechora region is given as an example.
地震偏移的目的是在地震资料的基础上重建地球深度非均匀性图像。地震数据是利用位于密集网格点的浅井的射孔获得的。这些镜头可以看作是特殊的点源。来自深度不均匀性的反射和散射地震波由同样位于地表密集网格点上的检波器接收。基于这些波可以构造深度不均匀性的地震图像。三维地震偏移的实现意味着解决了大约10个4÷5三维直接波传播问题。因此,有效的渐近方法具有重要的实际意义。基于一种新的渐近方法,实现了多点三维地震偏移程序。它考虑了多通波传播和焦散。该程序在MPI环境中使用数百和数千个处理器内核的并行计算。该程序在国际合成“SEG盐”数据集和实际数据上进行了成功的测试。以Timan-Pechora地区的地震图像立方体为例。
{"title":"Efficient Parallel Implementation of Multi-Arrival 3D Prestack Seismic Depth Migration","authors":"A. Pleshkevich, A. Ivanov, V. Levchenko, S. Khilkov, B. P. Moroz","doi":"10.14529/JSFI190101","DOIUrl":"https://doi.org/10.14529/JSFI190101","url":null,"abstract":"The goal of seismic migration is to reconstruct the image of Earth's depth inhomogeneities on the base of seismic data. Seismic data is obtained using shots in shallow wells that are located in a dense grid points. Those shots could be considered as special point sources. A reflected and scattered seismic waves from the depth inhomogeneities are received by geophones located also in a dense grid points on a surface. A seismic image of depth inhomogeneities can be constructed based on these waves. The implementation of 3-D seismic migration implies the solution of about 10 4÷5 3-D direct problems of wave propagation. Hence efficient asymptotic methods are of a great practical importance. The multi-arrival 3-D seismic migration program is implemented based on a new asymptotic method. It takes into account multi-pass wave propagation and caustics. The program uses parallel calculations in an MPI environment on hundreds and thousands of processor cores. The program was successfully tested on an international synthetic \"SEG salt\" data set and on real data. A seismic image cube for Timan-Pechora region is given as an example.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123919430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines 迭代流求解器在现代向量机上不同实现方案的性能评价
Pub Date : 2019-03-01 DOI: 10.14529/JSFI190106
Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, K. Komatsu, Ryusuke Egawa, A. Musa, H. Takizawa, Hiroaki Kobayashi
Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.
现代超级计算机由多核处理器组成,这些处理器最近采用矢量指令或所谓的SIMD指令来提高性能。为了在这些处理器上实现更高的性能,需要对数值模拟进行矢量化。长期以来使用的各种遗留数值模拟代码通常包含两个版本的源代码:非矢量化版本和针对旧矢量超级计算机优化的矢量化版本。为了实现更高的性能,明确哪个版本更适合现代超级计算机是很重要的。在本文中,我们评估了一种称为FASTEST的传统流体动力学模拟代码在现代超级计算机上的性能,以便为将此类代码迁移到现代超级计算机提供指导。求解器有非矢量化版本和矢量化版本,矢量化版本采用超平面排序方法进行矢量化。对于求值,我们还实现了红黑排序法,这是求解器矢量化的另一种方法。然后,我们研究了NEC SX-ACE、SXAurora TSUBASA、Intel Xeon Gold和Xeon Phi处理器的性能。结果表明,在SX-ACE和SX-Aurora TSUBASA上使用红黑排序方法执行时间最短,而在Xeon Gold和Xeon Phi上使用非矢量化版本执行时间最短。因此,在多台现代超级计算机上实现更高的性能可能需要维护多个代码版本。我们还表明,红黑排序方法更有希望在现代超级计算机上实现高性能。
{"title":"Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines","authors":"Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, K. Komatsu, Ryusuke Egawa, A. Musa, H. Takizawa, Hiroaki Kobayashi","doi":"10.14529/JSFI190106","DOIUrl":"https://doi.org/10.14529/JSFI190106","url":null,"abstract":"Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128954592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Fully Conservative Parallel Numerical Algorithm with Adaptive Spatial Grid for Solving Nonlinear Diffusion Equations in Image Processing 基于自适应空间网格的全保守并行数值算法求解图像处理中的非线性扩散方程
Pub Date : 2019-02-26 DOI: 10.14529/JSFI190103
A. Bulygin, D. Vrazhnov
In this paper we present simple yet efficient parallel program implementation of grid-difference method for solving nonlinear parabolic equations, which satisfies both fully conservative property and second order of approximation on non-uniform spatial grid according to geometrical sanity of a task. The proposed algorithm was tested on Perona–Malik method for image noise ltering task based on differential equations. Also in this work we propose generalization of the Perona–Malik equation, which is a one of diffusion in complex-valued region type. This corresponds to the conversion to such types of nonlinear equations like Leontovich–Fock equation with a dependent on the gradient field according to the nonlinear law coefficient of diffraction. This is a special case of generalization of the Perona–Malik equation to the multicomponent case. This approach makes noise removal process more flexible by increasing its capabilities, which allows achieving better results for the task of image denoising.
本文给出了求解非线性抛物方程的网格差分法的简单而有效的并行程序实现,它既满足非均匀空间网格上的完全保守性,又满足任务的几何完整性的二阶逼近性。在基于微分方程的图像噪声滤波Perona-Malik方法上进行了实验。本文还对复值区域型扩散方程Perona-Malik方程进行了推广。这对应于根据衍射的非线性定律系数转换为诸如依赖于梯度场的莱昂托维奇-福克方程之类的非线性方程。这是将Perona-Malik方程推广到多分量情况的一个特例。这种方法通过提高其能力使去噪过程更加灵活,从而可以在图像去噪任务中获得更好的结果。
{"title":"A Fully Conservative Parallel Numerical Algorithm with Adaptive Spatial Grid for Solving Nonlinear Diffusion Equations in Image Processing","authors":"A. Bulygin, D. Vrazhnov","doi":"10.14529/JSFI190103","DOIUrl":"https://doi.org/10.14529/JSFI190103","url":null,"abstract":"In this paper we present simple yet efficient parallel program implementation of grid-difference method for solving nonlinear parabolic equations, which satisfies both fully conservative property and second order of approximation on non-uniform spatial grid according to geometrical sanity of a task. The proposed algorithm was tested on Perona–Malik method for image noise ltering task based on differential equations. Also in this work we propose generalization of the Perona–Malik equation, which is a one of diffusion in complex-valued region type. This corresponds to the conversion to such types of nonlinear equations like Leontovich–Fock equation with a dependent on the gradient field according to the nonlinear law coefficient of diffraction. This is a special case of generalization of the Perona–Malik equation to the multicomponent case. This approach makes noise removal process more flexible by increasing its capabilities, which allows achieving better results for the task of image denoising.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127936057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Facilitating HPC Operation and Administration via Cloud 通过云促进HPC的运营和管理
Pub Date : 2019-02-26 DOI: 10.14529/JSFI190105
Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji
Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.
经历了巨大的增长,云计算提供了许多优于其他分布式平台的优势。高性能计算(High Performance Computing, HPC)优势的引入,带动了HPC as a Service (HPC as a Service)的发展。HPC as a Service主要关注于资源的灵活访问、成本效益和终端用户无需维护。除了提供和使用HPC caas之外,HPC中心还可以更多地利用云计算技术,例如,简化部署的HPC系统的操作和管理,这是大多数超级计算机中心普遍面临的问题。本文报告了EasyOP产品的开发,以实现一个或多个云或HPC设施可以在一个集中和统一的控制平台上运行的想法。EasyOP的主要目的是将HPC系统软硬件、故障报警、作业调度等信息发送到无锡云计算中心。经过一系列的分析和处理,我们可以通过短信、邮件、微信等方式将报警、作业调度等许多有价值的数据分享给HPC用户。更重要的是,随着数据在云计算中心的积累,EasyOP可以提供几个易于使用的功能,如用户管理,月/年报告,一屏监控等。截至2016年底,EasyOP已成功服务50多个HPC系统,节点近10000个,固定用户超过300个。
{"title":"Facilitating HPC Operation and Administration via Cloud","authors":"Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji","doi":"10.14529/JSFI190105","DOIUrl":"https://doi.org/10.14529/JSFI190105","url":null,"abstract":"Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121548203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
New Binding Mode of SLURP Protein to a7 Nicotinic Acetylcholine Receptor Revealed by Computer Simulations 计算机模拟揭示SLURP蛋白与7烟碱乙酰胆碱受体的新结合模式
Pub Date : 2018-12-01 DOI: 10.14529/JSFI180407
Igor Diankin, D. Kudryavtsev, A. Zalevsky, V. Tsetlin, A. Golovin
SLURP-1 is a member of three-finger toxin-like proteins. Their characteristic feature is a set of three beta strands extruding from hydrophobic core stabilized by disulfide bonds. Each beta-strand carries a flexible loop, which is responsible for recognition. SLURP-1 was recently shown to act as an endogenous growth regulator of keratinocytes and tumor suppressor by reducing cell migration and invasion by antagonizing the pro-malignant effects of nicotine. This effect is achieved through allosteric interaction with alpha7 nicotinic acetylcholine receptors (alpha-7 nAChRs) in an antagonist-like manner. Moreover, this interaction is unaffected by several well-known agents specifically alpha-bungarotoxin. In this work, we carry out the conformational analysis of the SLURP-1 by a microsecond-long full-atom explicit solvent molecular dynamics simulations followed by clustering, to identify representative states. To achieve this timescale we employed a GPU-accelerated version of GROMACS modeling package. To avoid human bias in clustering we used a non-parametric clustering algorithm Affinity Propagation adapted for biomolecules and HPC environments. Then, we applied protein-protein molecular docking of the ten most massive clusters to alpha7-nAChRs in order to test if structural variability can affect binding. Docking simulations revealed the unusual binding mode of one of the minor SLURP-1 conformations.
SLURP-1是三指毒素样蛋白的一个成员。它们的特征是一组由二硫键稳定的疏水性核心挤出的三股β链。每条-链携带一个负责识别的柔性环。SLURP-1最近被证明是角化细胞的内源性生长调节剂和肿瘤抑制因子,通过拮抗尼古丁的促恶性作用来减少细胞的迁移和侵袭。这种作用是通过与α -7烟碱乙酰胆碱受体(α -7 nAChRs)以拮抗剂样方式发生变构相互作用实现的。此外,这种相互作用不受几种已知药物的影响,特别是α -班加罗毒素。在这项工作中,我们通过微秒长的全原子显式溶剂分子动力学模拟进行了SLURP-1的构象分析,然后进行聚类,以确定具有代表性的状态。为了达到这个时间尺度,我们使用了gpu加速版本的GROMACS建模包。为了避免聚类中的人为偏差,我们使用了一种适用于生物分子和HPC环境的非参数聚类算法Affinity Propagation。然后,我们将10个最大的簇与alpha7- nachr进行蛋白-蛋白分子对接,以测试结构变异性是否会影响结合。对接模拟揭示了其中一个次要SLURP-1构象的不寻常的结合模式。
{"title":"New Binding Mode of SLURP Protein to a7 Nicotinic Acetylcholine Receptor Revealed by Computer Simulations","authors":"Igor Diankin, D. Kudryavtsev, A. Zalevsky, V. Tsetlin, A. Golovin","doi":"10.14529/JSFI180407","DOIUrl":"https://doi.org/10.14529/JSFI180407","url":null,"abstract":"SLURP-1 is a member of three-finger toxin-like proteins. Their characteristic feature is a set of three beta strands extruding from hydrophobic core stabilized by disulfide bonds. Each beta-strand carries a flexible loop, which is responsible for recognition. SLURP-1 was recently shown to act as an endogenous growth regulator of keratinocytes and tumor suppressor by reducing cell migration and invasion by antagonizing the pro-malignant effects of nicotine. This effect is achieved through allosteric interaction with alpha7 nicotinic acetylcholine receptors (alpha-7 nAChRs) in an antagonist-like manner. Moreover, this interaction is unaffected by several well-known agents specifically alpha-bungarotoxin. In this work, we carry out the conformational analysis of the SLURP-1 by a microsecond-long full-atom explicit solvent molecular dynamics simulations followed by clustering, to identify representative states. To achieve this timescale we employed a GPU-accelerated version of GROMACS modeling package. To avoid human bias in clustering we used a non-parametric clustering algorithm Affinity Propagation adapted for biomolecules and HPC environments. Then, we applied protein-protein molecular docking of the ten most massive clusters to alpha7-nAChRs in order to test if structural variability can affect binding. Docking simulations revealed the unusual binding mode of one of the minor SLURP-1 conformations.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131138772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Supercomput. Front. Innov.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1