Julian Hornich, Julian Hammer, G. Hager, T. Gruber, G. Wellein
Stencil algorithms have been receiving considerable interest in HPC research for decades. The techniques used to approach multi-core stencil performance modeling and engineering span basic runtime measurements, elaborate performance models, detailed hardware counter analysis, and thorough scaling behavior evaluation. Due to the plurality of approaches and stencil patterns, we set out to develop a generalizable methodology for reproducible measurements accompanied by state-of-the-art performance models. Our open-source toolchain, and collected results are publicly available in the "Intranode Stencil Performance Evaluation Collection" (INSPECT). We present the underlying methodologies, models and tools involved in gathering and documenting the performance behavior of a collection of typical stencil patterns across multiple architectures and hardware configuration options. Our aim is to endow performance-aware application developers with reproducible baseline performance data and validated models to initiate a well-defined process of performance assessment and optimization.
{"title":"Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT","authors":"Julian Hornich, Julian Hammer, G. Hager, T. Gruber, G. Wellein","doi":"10.14529/JSFI190301","DOIUrl":"https://doi.org/10.14529/JSFI190301","url":null,"abstract":"Stencil algorithms have been receiving considerable interest in HPC research for decades. The techniques used to approach multi-core stencil performance modeling and engineering span basic runtime measurements, elaborate performance models, detailed hardware counter analysis, and thorough scaling behavior evaluation. Due to the plurality of approaches and stencil patterns, we set out to develop a generalizable methodology for reproducible measurements accompanied by state-of-the-art performance models. Our open-source toolchain, and collected results are publicly available in the \"Intranode Stencil Performance Evaluation Collection\" (INSPECT). We present the underlying methodologies, models and tools involved in gathering and documenting the performance behavior of a collection of typical stencil patterns across multiple architectures and hardware configuration options. Our aim is to endow performance-aware application developers with reproducible baseline performance data and validated models to initiate a well-defined process of performance assessment and optimization.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127389293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Shah, C. Kuo, Akihiro Nomura, S. Matsuoka, F. Wolf
On large-scale clusters, tens to hundreds of applications can simultaneously access a parallel file system, leading to contention and, in its wake, to degraded application performance. In this article, we analyze the influence of file-access patterns on the degree of interference. As it is by experience most intrusive, we focus our attention on write-write contention. We observe considerable differences among the interference potentials of several typical write patterns. In particular, we found that if one parallel program writes large output files while another one writes small checkpointing files, then the latter is slowed down when the checkpointing files are small enough and the former is vice versa. Moreover, applications with a few processes writing large output files already can significantly hinder applications with many processes from checkpointing small files. Such effects can seriously impact the runtime of real applications—up to a factor of five in one instance. Our insights and measurement techniques offer an opportunity to automatically classify the interference potential between applications and to adjust scheduling decisions accordingly.
{"title":"How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications","authors":"A. Shah, C. Kuo, Akihiro Nomura, S. Matsuoka, F. Wolf","doi":"10.14529/JSFI190203","DOIUrl":"https://doi.org/10.14529/JSFI190203","url":null,"abstract":"On large-scale clusters, tens to hundreds of applications can simultaneously access a parallel file system, leading to contention and, in its wake, to degraded application performance. In this article, we analyze the influence of file-access patterns on the degree of interference. As it is by experience most intrusive, we focus our attention on write-write contention. We observe considerable differences among the interference potentials of several typical write patterns. In particular, we found that if one parallel program writes large output files while another one writes small checkpointing files, then the latter is slowed down when the checkpointing files are small enough and the former is vice versa. Moreover, applications with a few processes writing large output files already can significantly hinder applications with many processes from checkpointing small files. Such effects can seriously impact the runtime of real applications—up to a factor of five in one instance. Our insights and measurement techniques offer an opportunity to automatically classify the interference potential between applications and to adjust scheduling decisions accordingly.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122216373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Kaliszan, N. Meyer, S. Petruczynik, M. Gienger, Sergiy Gogolenko
The work undertaken in this paper was done in the Centre of Excellence for Global Systems Science (CoeGSS) – an interdisciplinary project funded by the European Commission. CoeGSS project provides a computer-aided decision support in the face of global challenges (e.g. development of energy, water and food supply systems, urbanisation processes and growth of the cities, pandemic control, etc.) and tries to bring together HPC and global systems science. This paper presents a proposition of GSS benchmark which evaluates HPC architectures with respect to GSS applications and seeks for the best HPC system for typical GSS software environments. The outcome of the analysis is defining a benchmark which represents the average GSS environment and its challenges in a good way: spread of smoking habits and development of tobacco industry, development of green cars market and global urbanisation processes. Results of the tests that have been run on a number of recently appeared HPC platforms allow comparing processors’ architectures with respect to different applications using execution times, TDPs3 and TCOs4 as the basic metrics for ranking HPC architectures. Finally, we believe that our analysis of the results conveys a valuable information to the broadened GSS audience which might help to determine the hardware demands for their specific applications, as well as to the HPC community which requires a mature benchmark set reflecting requirements and traits of the GSS applications. Our work can be considered as a step into direction of development of such mature benchmark.
{"title":"HPC Processors Benchmarking Assessment for Global System Science Applications","authors":"D. Kaliszan, N. Meyer, S. Petruczynik, M. Gienger, Sergiy Gogolenko","doi":"10.14529/JSFI190202","DOIUrl":"https://doi.org/10.14529/JSFI190202","url":null,"abstract":"The work undertaken in this paper was done in the Centre of Excellence for Global Systems Science (CoeGSS) – an interdisciplinary project funded by the European Commission. CoeGSS project provides a computer-aided decision support in the face of global challenges (e.g. development of energy, water and food supply systems, urbanisation processes and growth of the cities, pandemic control, etc.) and tries to bring together HPC and global systems science. This paper presents a proposition of GSS benchmark which evaluates HPC architectures with respect to GSS applications and seeks for the best HPC system for typical GSS software environments. The outcome of the analysis is defining a benchmark which represents the average GSS environment and its challenges in a good way: spread of smoking habits and development of tobacco industry, development of green cars market and global urbanisation processes. Results of the tests that have been run on a number of recently appeared HPC platforms allow comparing processors’ architectures with respect to different applications using execution times, TDPs3 and TCOs4 as the basic metrics for ranking HPC architectures. Finally, we believe that our analysis of the results conveys a valuable information to the broadened GSS audience which might help to determine the hardware demands for their specific applications, as well as to the HPC community which requires a mature benchmark set reflecting requirements and traits of the GSS applications. Our work can be considered as a step into direction of development of such mature benchmark.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127499916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the fact that the open-source community around the RISC-V instruction set architecture is growing rapidly, there is still no high-speed open-source hardware implementation of the IEEE 754-2008 floating-point standard available. We designed a Fused Multiply-Add Floating-Point Unit compatible with the RISC-V ISA in SystemVerilog, which enables us to conduct detailed optimizations where necessary. The design has been verified with the industry standard simulation-based Universal Verification Methodology using the Specman e Hardware Verification Language. The most challenging part of the verification is the reference model, for which we integrated the Floating-Point Unit of an existing Intel processor using the Function Level Interface provided by Specman e. With the use of Intel's Floating-Point Unit we have a ``known good" and fast reference model. The Back-End flow was done with Global Foundries' 22 nm Fully-Depleted Silicon-On-Insulator (GF22FDX) process using Cadence tools. We reached 1.8 GHz over PVT corners with a 0.8 V forward body bias, but there is still a large potential for further RTL optimization. A power analysis was conducted with stimuli generated by the verification environment and resulted in 212 mW.
{"title":"Development of a RISC-V-Conform Fused Multiply-Add Floating-Point Unit","authors":"Felix Kaiser, Stefan Kosnac, U. Brüning","doi":"10.14529/JSFI190205","DOIUrl":"https://doi.org/10.14529/JSFI190205","url":null,"abstract":"Despite the fact that the open-source community around the RISC-V instruction set architecture is growing rapidly, there is still no high-speed open-source hardware implementation of the IEEE 754-2008 floating-point standard available. We designed a Fused Multiply-Add Floating-Point Unit compatible with the RISC-V ISA in SystemVerilog, which enables us to conduct detailed optimizations where necessary. The design has been verified with the industry standard simulation-based Universal Verification Methodology using the Specman e Hardware Verification Language. The most challenging part of the verification is the reference model, for which we integrated the Floating-Point Unit of an existing Intel processor using the Function Level Interface provided by Specman e. With the use of Intel's Floating-Point Unit we have a ``known good\" and fast reference model. The Back-End flow was done with Global Foundries' 22 nm Fully-Depleted Silicon-On-Insulator (GF22FDX) process using Cadence tools. We reached 1.8 GHz over PVT corners with a 0.8 V forward body bias, but there is still a large potential for further RTL optimization. A power analysis was conducted with stimuli generated by the verification environment and resulted in 212 mW.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124686596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Radchenko, Ameer B. A. Alaasam, Andrei Tchernykh
Cloud computing systems have become widely used for Big Data processing, providing access to a wide variety of computing resources and a greater distribution between multi-clouds. This trend has been strengthened by the rapid development of the Internet of Things (IoT) concept. Virtualization via virtual machines and containers is a traditional way of organization of cloud computing infrastructure. Containerization technology provides a lightweight virtual runtime environment. In addition to the advantages of traditional virtual machines in terms of size and flexibility, containers are particularly important for integration tasks for PaaS solutions, such as application packaging and service orchestration. In this paper, we overview the current state-of-the-art of virtualization and containerization approaches and technologies in the context of Big Data tasks solution. We present the results of studies which compare the efficiency of containerization and virtualization technologies to solve Big Data problems. We also analyze containerized and virtualized services collaboration solutions to support automation of the deployment and execution of Big Data applications in the cloud infrastructure.
{"title":"Comparative Analysis of Virtualization Methods in Big Data Processing","authors":"G. Radchenko, Ameer B. A. Alaasam, Andrei Tchernykh","doi":"10.14529/JSFI190107","DOIUrl":"https://doi.org/10.14529/JSFI190107","url":null,"abstract":"Cloud computing systems have become widely used for Big Data processing, providing access to a wide variety of computing resources and a greater distribution between multi-clouds. This trend has been strengthened by the rapid development of the Internet of Things (IoT) concept. Virtualization via virtual machines and containers is a traditional way of organization of cloud computing infrastructure. Containerization technology provides a lightweight virtual runtime environment. In addition to the advantages of traditional virtual machines in terms of size and flexibility, containers are particularly important for integration tasks for PaaS solutions, such as application packaging and service orchestration. In this paper, we overview the current state-of-the-art of virtualization and containerization approaches and technologies in the context of Big Data tasks solution. We present the results of studies which compare the efficiency of containerization and virtualization technologies to solve Big Data problems. We also analyze containerized and virtualized services collaboration solutions to support automation of the deployment and execution of Big Data applications in the cloud infrastructure.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116467383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Pleshkevich, A. Ivanov, V. Levchenko, S. Khilkov, B. P. Moroz
The goal of seismic migration is to reconstruct the image of Earth's depth inhomogeneities on the base of seismic data. Seismic data is obtained using shots in shallow wells that are located in a dense grid points. Those shots could be considered as special point sources. A reflected and scattered seismic waves from the depth inhomogeneities are received by geophones located also in a dense grid points on a surface. A seismic image of depth inhomogeneities can be constructed based on these waves. The implementation of 3-D seismic migration implies the solution of about 10 4÷5 3-D direct problems of wave propagation. Hence efficient asymptotic methods are of a great practical importance. The multi-arrival 3-D seismic migration program is implemented based on a new asymptotic method. It takes into account multi-pass wave propagation and caustics. The program uses parallel calculations in an MPI environment on hundreds and thousands of processor cores. The program was successfully tested on an international synthetic "SEG salt" data set and on real data. A seismic image cube for Timan-Pechora region is given as an example.
{"title":"Efficient Parallel Implementation of Multi-Arrival 3D Prestack Seismic Depth Migration","authors":"A. Pleshkevich, A. Ivanov, V. Levchenko, S. Khilkov, B. P. Moroz","doi":"10.14529/JSFI190101","DOIUrl":"https://doi.org/10.14529/JSFI190101","url":null,"abstract":"The goal of seismic migration is to reconstruct the image of Earth's depth inhomogeneities on the base of seismic data. Seismic data is obtained using shots in shallow wells that are located in a dense grid points. Those shots could be considered as special point sources. A reflected and scattered seismic waves from the depth inhomogeneities are received by geophones located also in a dense grid points on a surface. A seismic image of depth inhomogeneities can be constructed based on these waves. The implementation of 3-D seismic migration implies the solution of about 10 4÷5 3-D direct problems of wave propagation. Hence efficient asymptotic methods are of a great practical importance. The multi-arrival 3-D seismic migration program is implemented based on a new asymptotic method. It takes into account multi-pass wave propagation and caustics. The program uses parallel calculations in an MPI environment on hundreds and thousands of processor cores. The program was successfully tested on an international synthetic \"SEG salt\" data set and on real data. A seismic image cube for Timan-Pechora region is given as an example.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123919430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, K. Komatsu, Ryusuke Egawa, A. Musa, H. Takizawa, Hiroaki Kobayashi
Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.
{"title":"Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines","authors":"Kenta Yamaguchi, Takashi Soga, Yoichi Shimomura, Thorsten Reimann, K. Komatsu, Ryusuke Egawa, A. Musa, H. Takizawa, Hiroaki Kobayashi","doi":"10.14529/JSFI190106","DOIUrl":"https://doi.org/10.14529/JSFI190106","url":null,"abstract":"Modern supercomputers consist of multi-core processors, and these processors have recently employed vector instructions, or so-called SIMD instructions, to improve performances. Numerical simulations need to be vectorized in order to achieve higher performance on these processors. Various legacy numerical simulation codes that have been utilized for a long time often contain two versions of source codes: a non-vectorized version and a vectorized version that is optimized for old vector supercomputers. It is important to clarify which version is better for modern supercomputers in order to achieve higher performance. In this paper, we evaluate the performances of a legacy fluid dynamics simulation code called FASTEST on modern supercomputers in order to provide a guidepost for migrating such codes to modern supercomputers. The solver has a nonvectorized version and a vectorized version, and the latter uses the hyperplane ordering method for vectorization. For the evaluation, we also implement the red-black ordering method, which is another way to vectorize the solver. Then, we examine the performance on NEC SX-ACE, SXAurora TSUBASA, Intel Xeon Gold, and Xeon Phi. The results show that the shortest execution times are with the red-black ordering method on SX-ACE and SX-Aurora TSUBASA, and with the non-vectorized version on Xeon Gold and Xeon Phi. Therefore, achieving a higher performance on multiple modern supercomputers potentially requires maintenance of multiple code versions. We also show that the red-black ordering method is more promising to achieve high performance on modern supercomputers.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128954592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present simple yet efficient parallel program implementation of grid-difference method for solving nonlinear parabolic equations, which satisfies both fully conservative property and second order of approximation on non-uniform spatial grid according to geometrical sanity of a task. The proposed algorithm was tested on Perona–Malik method for image noise ltering task based on differential equations. Also in this work we propose generalization of the Perona–Malik equation, which is a one of diffusion in complex-valued region type. This corresponds to the conversion to such types of nonlinear equations like Leontovich–Fock equation with a dependent on the gradient field according to the nonlinear law coefficient of diffraction. This is a special case of generalization of the Perona–Malik equation to the multicomponent case. This approach makes noise removal process more flexible by increasing its capabilities, which allows achieving better results for the task of image denoising.
{"title":"A Fully Conservative Parallel Numerical Algorithm with Adaptive Spatial Grid for Solving Nonlinear Diffusion Equations in Image Processing","authors":"A. Bulygin, D. Vrazhnov","doi":"10.14529/JSFI190103","DOIUrl":"https://doi.org/10.14529/JSFI190103","url":null,"abstract":"In this paper we present simple yet efficient parallel program implementation of grid-difference method for solving nonlinear parabolic equations, which satisfies both fully conservative property and second order of approximation on non-uniform spatial grid according to geometrical sanity of a task. The proposed algorithm was tested on Perona–Malik method for image noise ltering task based on differential equations. Also in this work we propose generalization of the Perona–Malik equation, which is a one of diffusion in complex-valued region type. This corresponds to the conversion to such types of nonlinear equations like Leontovich–Fock equation with a dependent on the gradient field according to the nonlinear law coefficient of diffraction. This is a special case of generalization of the Perona–Malik equation to the multicomponent case. This approach makes noise removal process more flexible by increasing its capabilities, which allows achieving better results for the task of image denoising.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127936057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji
Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.
经历了巨大的增长,云计算提供了许多优于其他分布式平台的优势。高性能计算(High Performance Computing, HPC)优势的引入,带动了HPC as a Service (HPC as a Service)的发展。HPC as a Service主要关注于资源的灵活访问、成本效益和终端用户无需维护。除了提供和使用HPC caas之外,HPC中心还可以更多地利用云计算技术,例如,简化部署的HPC系统的操作和管理,这是大多数超级计算机中心普遍面临的问题。本文报告了EasyOP产品的开发,以实现一个或多个云或HPC设施可以在一个集中和统一的控制平台上运行的想法。EasyOP的主要目的是将HPC系统软硬件、故障报警、作业调度等信息发送到无锡云计算中心。经过一系列的分析和处理,我们可以通过短信、邮件、微信等方式将报警、作业调度等许多有价值的数据分享给HPC用户。更重要的是,随着数据在云计算中心的积累,EasyOP可以提供几个易于使用的功能,如用户管理,月/年报告,一屏监控等。截至2016年底,EasyOP已成功服务50多个HPC系统,节点近10000个,固定用户超过300个。
{"title":"Facilitating HPC Operation and Administration via Cloud","authors":"Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji","doi":"10.14529/JSFI190105","DOIUrl":"https://doi.org/10.14529/JSFI190105","url":null,"abstract":"Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121548203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Igor Diankin, D. Kudryavtsev, A. Zalevsky, V. Tsetlin, A. Golovin
SLURP-1 is a member of three-finger toxin-like proteins. Their characteristic feature is a set of three beta strands extruding from hydrophobic core stabilized by disulfide bonds. Each beta-strand carries a flexible loop, which is responsible for recognition. SLURP-1 was recently shown to act as an endogenous growth regulator of keratinocytes and tumor suppressor by reducing cell migration and invasion by antagonizing the pro-malignant effects of nicotine. This effect is achieved through allosteric interaction with alpha7 nicotinic acetylcholine receptors (alpha-7 nAChRs) in an antagonist-like manner. Moreover, this interaction is unaffected by several well-known agents specifically alpha-bungarotoxin. In this work, we carry out the conformational analysis of the SLURP-1 by a microsecond-long full-atom explicit solvent molecular dynamics simulations followed by clustering, to identify representative states. To achieve this timescale we employed a GPU-accelerated version of GROMACS modeling package. To avoid human bias in clustering we used a non-parametric clustering algorithm Affinity Propagation adapted for biomolecules and HPC environments. Then, we applied protein-protein molecular docking of the ten most massive clusters to alpha7-nAChRs in order to test if structural variability can affect binding. Docking simulations revealed the unusual binding mode of one of the minor SLURP-1 conformations.
{"title":"New Binding Mode of SLURP Protein to a7 Nicotinic Acetylcholine Receptor Revealed by Computer Simulations","authors":"Igor Diankin, D. Kudryavtsev, A. Zalevsky, V. Tsetlin, A. Golovin","doi":"10.14529/JSFI180407","DOIUrl":"https://doi.org/10.14529/JSFI180407","url":null,"abstract":"SLURP-1 is a member of three-finger toxin-like proteins. Their characteristic feature is a set of three beta strands extruding from hydrophobic core stabilized by disulfide bonds. Each beta-strand carries a flexible loop, which is responsible for recognition. SLURP-1 was recently shown to act as an endogenous growth regulator of keratinocytes and tumor suppressor by reducing cell migration and invasion by antagonizing the pro-malignant effects of nicotine. This effect is achieved through allosteric interaction with alpha7 nicotinic acetylcholine receptors (alpha-7 nAChRs) in an antagonist-like manner. Moreover, this interaction is unaffected by several well-known agents specifically alpha-bungarotoxin. In this work, we carry out the conformational analysis of the SLURP-1 by a microsecond-long full-atom explicit solvent molecular dynamics simulations followed by clustering, to identify representative states. To achieve this timescale we employed a GPU-accelerated version of GROMACS modeling package. To avoid human bias in clustering we used a non-parametric clustering algorithm Affinity Propagation adapted for biomolecules and HPC environments. Then, we applied protein-protein molecular docking of the ten most massive clusters to alpha7-nAChRs in order to test if structural variability can affect binding. Docking simulations revealed the unusual binding mode of one of the minor SLURP-1 conformations.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131138772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}