首页 > 最新文献

IBM Journal of Research and Development最新文献

英文 中文
Implementation of improvement actions in a company that produces frames and moldings 在生产框架和模具的公司实施改进行动
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-06-30 DOI: 10.35429/jrd.2021.19.7.22.30
R. Fornés-Rivera, Marco Antonio Conant-Pablos, Adolfo Cano-Carrasco, Gildardo Guadalupe López-Rojo
This research was developed in a company that manufactures frames and moldings in the production and quality area and addresses the need to implement improvement actions due to rework and low production in the patching workstation, derived from flaws such as poor patching, bump, bubble and porosity in the products. Currently there is a production record of 1.75% and rework of 19.25% in the first hours of the working day. The objective was to implement improvement actions, through the 8D's methodology, to reduce rework and increase production. The procedure implied forming a team; defining the problem; implementing containment actions; identifying and verifying the root cause; determining permanent corrective actions; identifying and implementing permanent corrective actions; preventing the recurrence of the problem and/or root cause, and acknowledging the effort of the team. It contributed with the increase in production and reduction of rework in the patching workstation, thus fulfilling the objective of this research.
这项研究是在一家生产框架和模具的生产和质量领域的公司进行的,并解决了由于修补工作站的返工和低产量而导致的改进行动的需要,这些缺陷源于产品的修补不良、凹凸、气泡和孔隙。目前工作日头几个小时的生产记录为1.75%,返工率为19.25%。目标是实施改进行动,通过8D的方法,减少返工,提高产量。这个过程意味着组建一个团队;定义问题;实施遏制行动;查明和核实根本原因;确定永久性纠正措施;确定并实施永久性纠正措施;防止问题和/或根本原因的再次发生,并认可团队的努力。提高了补片工作站的生产效率,减少了返工,达到了本研究的目的。
{"title":"Implementation of improvement actions in a company that produces frames and moldings","authors":"R. Fornés-Rivera, Marco Antonio Conant-Pablos, Adolfo Cano-Carrasco, Gildardo Guadalupe López-Rojo","doi":"10.35429/jrd.2021.19.7.22.30","DOIUrl":"https://doi.org/10.35429/jrd.2021.19.7.22.30","url":null,"abstract":"This research was developed in a company that manufactures frames and moldings in the production and quality area and addresses the need to implement improvement actions due to rework and low production in the patching workstation, derived from flaws such as poor patching, bump, bubble and porosity in the products. Currently there is a production record of 1.75% and rework of 19.25% in the first hours of the working day. The objective was to implement improvement actions, through the 8D's methodology, to reduce rework and increase production. The procedure implied forming a team; defining the problem; implementing containment actions; identifying and verifying the root cause; determining permanent corrective actions; identifying and implementing permanent corrective actions; preventing the recurrence of the problem and/or root cause, and acknowledging the effort of the team. It contributed with the increase in production and reduction of rework in the patching workstation, thus fulfilling the objective of this research.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73754888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of a convolutional neural network for classification of biomedical signals 用于生物医学信号分类的卷积神经网络设计
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-06-30 DOI: 10.35429/jrd.2020.17.6.15.20
Jaime Jalomo, Edith Preciado, Jorge Gudiño
Biomedical signals are current case of Avant-garde study, thanks to advances in artificial intelligence, every day new methods are implemented that are useful for the treatment of this signals, mainly to detect anomalies or diseases with greater precision. A solution on base of the Deep Learning is proposed, this technology has proven to be efficient in handling high-level feature data, in it featured neural networks convolutionals (NNC) which are ideal in image management. In this paper, electrocardiographic signals (ECG) designed from a dynamic mathematical model in a two convolution layer NNC for classification are used.
生物医学信号是前卫研究的当前案例,由于人工智能的进步,每天都有新的方法被实施,这些方法对治疗这些信号有用,主要是为了更精确地检测异常或疾病。提出了一种基于深度学习的解决方案,该技术已被证明在处理高级特征数据方面是有效的,其中特征神经网络卷积(NNC)是图像管理的理想选择。本文利用动态数学模型设计的心电信号在两层卷积神经网络中进行分类。
{"title":"Design of a convolutional neural network for classification of biomedical signals","authors":"Jaime Jalomo, Edith Preciado, Jorge Gudiño","doi":"10.35429/jrd.2020.17.6.15.20","DOIUrl":"https://doi.org/10.35429/jrd.2020.17.6.15.20","url":null,"abstract":"Biomedical signals are current case of Avant-garde study, thanks to advances in artificial intelligence, every day new methods are implemented that are useful for the treatment of this signals, mainly to detect anomalies or diseases with greater precision. A solution on base of the Deep Learning is proposed, this technology has proven to be efficient in handling high-level feature data, in it featured neural networks convolutionals (NNC) which are ideal in image management. In this paper, electrocardiographic signals (ECG) designed from a dynamic mathematical model in a two convolution layer NNC for classification are used.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85361534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthesis of barium ferrite, using barite mineral ore and a metallurgical waste 利用重晶石矿石和某冶金废料合成钡铁氧体
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-05-01 DOI: 10.35429/jrd.2019.17.6.1.8
M. G. ROSALES-SOSA, Manuel Garcia-Yregoi, Blanca Idalia Rosales-Sosa, R. Servin-Castañeda
Samples of barite mineral ore, were ground to a mesh of 250, and then were subjected to a leaching stage with hydrochloric acid at different times, then; the leached barite mineral ore was subjected to a carbonation stage controlling different parameters such as pH, temperature, time and speed of agitation. Finally, it was subjected to a sintering stage with the Fe2O3 precursor obtained from the waste powder of the steelmaking company’s rolling process, in a temperature range of 1000 and 1200 ° C, for 12 and 24 times. The materials obtained are characterized by infrared spectroscopy (IR Spectroscopy), X-ray Diffraction (XRD) and Scanning Electron Microscopy (SEM).
将重晶石矿物矿石的样品磨成250目,然后在不同的时间用盐酸浸出阶段,然后;通过控制pH、温度、搅拌时间和搅拌速度等参数,对浸出的重晶石矿石进行碳酸化处理。最后,用炼钢公司轧制过程的废粉中获得的Fe2O3前驱体,在1000℃和1200℃的温度范围内烧结12次和24次。采用红外光谱(IR)、x射线衍射(XRD)和扫描电镜(SEM)对所得材料进行了表征。
{"title":"Synthesis of barium ferrite, using barite mineral ore and a metallurgical waste","authors":"M. G. ROSALES-SOSA, Manuel Garcia-Yregoi, Blanca Idalia Rosales-Sosa, R. Servin-Castañeda","doi":"10.35429/jrd.2019.17.6.1.8","DOIUrl":"https://doi.org/10.35429/jrd.2019.17.6.1.8","url":null,"abstract":"Samples of barite mineral ore, were ground to a mesh of 250, and then were subjected to a leaching stage with hydrochloric acid at different times, then; the leached barite mineral ore was subjected to a carbonation stage controlling different parameters such as pH, temperature, time and speed of agitation. Finally, it was subjected to a sintering stage with the Fe2O3 precursor obtained from the waste powder of the steelmaking company’s rolling process, in a temperature range of 1000 and 1200 ° C, for 12 and 24 times. The materials obtained are characterized by infrared spectroscopy (IR Spectroscopy), X-ray Diffraction (XRD) and Scanning Electron Microscopy (SEM).","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80699487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Preface: Summit and Sierra Supercomputers 前言:Summit和Sierra超级计算机
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-03-13 DOI: 10.1147/JRD.2020.2976169
{"title":"Preface: Summit and Sierra Supercomputers","authors":"","doi":"10.1147/JRD.2020.2976169","DOIUrl":"10.1147/JRD.2020.2976169","url":null,"abstract":"","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2976169","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41632393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Next-generation geospatial-temporal information technologies for disaster management 用于灾害管理的下一代地理时空信息技术
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-01-31 DOI: 10.1147/JRD.2020.2970903
C. M. Albrecht;B. Elmegreen;O. Gunawan;H. F. Hamann;L. J. Klein;S. Lu;F. Mariano;C. Siebenschuh;J. Schmude
Traditional geographic information systems (GIS) have been disrupted by the emergence of Big Data in the form of geo-coded raster, vector, and time-series Internet-of-Things data. This article discusses the application of new scalable technologies that go far beyond relational databases and file-based storage on spinning disk or tape to incorporate both storage and processing data in the same platform. The roles of the Apache Hadoop Distributed File Systems and NoSQL key-value stores such as the Apache Hbase are discussed, along with indexing schemes that optimally support geospatial-temporal use. We highlight how this new approach can rapidly search multiple GIS data layers to obtain insights in the context of early warning, impact evaluation, response, and recovery to earthquake and wildfire disasters.
传统的地理信息系统(GIS)已经被地理编码光栅、矢量和时间序列物联网数据形式的大数据的出现所破坏。本文讨论了新的可扩展技术的应用,这些技术远远超出了关系数据库和旋转磁盘或磁带上基于文件的存储,将存储和处理数据合并到同一平台中。讨论了Apache Hadoop分布式文件系统和NoSQL键值存储(如Apache Hbase)的作用,以及最佳支持地理空间时间使用的索引方案。我们强调了这种新方法如何快速搜索多个GIS数据层,以在地震和野火灾害的预警、影响评估、响应和恢复方面获得见解。
{"title":"Next-generation geospatial-temporal information technologies for disaster management","authors":"C. M. Albrecht;B. Elmegreen;O. Gunawan;H. F. Hamann;L. J. Klein;S. Lu;F. Mariano;C. Siebenschuh;J. Schmude","doi":"10.1147/JRD.2020.2970903","DOIUrl":"https://doi.org/10.1147/JRD.2020.2970903","url":null,"abstract":"Traditional geographic information systems (GIS) have been disrupted by the emergence of Big Data in the form of geo-coded raster, vector, and time-series Internet-of-Things data. This article discusses the application of new scalable technologies that go far beyond relational databases and file-based storage on spinning disk or tape to incorporate both storage and processing data in the same platform. The roles of the Apache Hadoop Distributed File Systems and NoSQL key-value stores such as the Apache Hbase are discussed, along with indexing schemes that optimally support geospatial-temporal use. We highlight how this new approach can rapidly search multiple GIS data layers to obtain insights in the context of early warning, impact evaluation, response, and recovery to earthquake and wildfire disasters.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2970903","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49980048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Communication protocol optimization for enhanced GPU performance 通信协议优化,增强GPU性能
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-01-16 DOI: 10.1147/JRD.2020.2967311
S. S. Sharkawi;G. A. Chochia
The U.S. Department of Energy CORAL program systems SUMMIT and SIERRA are based on hybrid servers comprising IBM POWER9 CPUs and NVIDIA V100 graphics processing units (GPUs) connected by two extended data rate (EDR) links to a high-speed InfiniBand Network. A major challenge to the communication software stack is to optimize performance for all combinations of data origin and destination: host or GPU memory, same or different server. Alternate paths exist for routing data from GPU memory. When origin and destination are on different servers, it can be sent either via host memory or bypassing host memory with GPU direct feature. When origin and destination are on the same server, host memory can be bypassed with peer-to-peer inter process communication (IPC). For large messages pipelining makes host memory data path competitive with GPU direct. In this article, we explain the techniques used in Spectrum MPI Parallel Active Message Interface layer to cache memory types and attributes in order to reduce the overhead associated with calling the CUDA application programming interface (API); in addition, we detail the different protocols used for different memory types, device memory, managed memory, and host memory. To illustrate, the caching technique achieved a device-to-device latency improvement of 26% for intranode transfers and 19% for internode transfers.
美国能源部CORAL项目系统SUMMIT和SIERRA基于混合服务器,包括IBM POWER9 cpu和NVIDIA V100图形处理单元(gpu),通过两条扩展数据速率(EDR)链路连接到高速InfiniBand网络。通信软件栈面临的一个主要挑战是优化所有数据源和目的地组合的性能:主机或GPU内存,相同或不同的服务器。存在从GPU内存路由数据的替代路径。当原点和目的地在不同的服务器上时,它可以通过主机内存发送,也可以绕过具有GPU直接功能的主机内存发送。当源和目标位于同一台服务器上时,可以通过点对点进程间通信(IPC)绕过主机内存。对于大型消息,流水线使得主机内存数据路径与GPU直接竞争。在本文中,我们解释了在Spectrum MPI并行活动消息接口层中使用的技术来缓存内存类型和属性,以减少与调用CUDA应用程序编程接口(API)相关的开销;此外,我们还详细介绍了用于不同内存类型、设备内存、托管内存和主机内存的不同协议。为了说明这一点,缓存技术实现了设备到设备延迟的改进,对于内部节点传输提高了26%,对于节点间传输提高了19%。
{"title":"Communication protocol optimization for enhanced GPU performance","authors":"S. S. Sharkawi;G. A. Chochia","doi":"10.1147/JRD.2020.2967311","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967311","url":null,"abstract":"The U.S. Department of Energy CORAL program systems SUMMIT and SIERRA are based on hybrid servers comprising IBM POWER9 CPUs and NVIDIA V100 graphics processing units (GPUs) connected by two extended data rate (EDR) links to a high-speed InfiniBand Network. A major challenge to the communication software stack is to optimize performance for all combinations of data origin and destination: host or GPU memory, same or different server. Alternate paths exist for routing data from GPU memory. When origin and destination are on different servers, it can be sent either via host memory or bypassing host memory with GPU direct feature. When origin and destination are on the same server, host memory can be bypassed with peer-to-peer inter process communication (IPC). For large messages pipelining makes host memory data path competitive with GPU direct. In this article, we explain the techniques used in Spectrum MPI Parallel Active Message Interface layer to cache memory types and attributes in order to reduce the overhead associated with calling the CUDA application programming interface (API); in addition, we detail the different protocols used for different memory types, device memory, managed memory, and host memory. To illustrate, the caching technique achieved a device-to-device latency improvement of 26% for intranode transfers and 19% for internode transfers.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967311","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The high-speed networks of the Summit and Sierra supercomputers Summit和Sierra超级计算机的高速网络
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-01-16 DOI: 10.1147/JRD.2020.2967330
C. B. Stunkel;R. L. Graham;G. Shainer;M. Kagan;S. S. Sharkawi;B. Rosenburg;G. A. Chochia
Oak Ridge National Laboratory's Summit supercomputer and Lawrence Livermore National Laboratory's Sierra supercomputer utilize InfiniBand interconnect in a Fat-tree network topology, interconnecting all compute nodes, storage nodes, administration, and management nodes into one linearly scalable network. These networks are based on Mellanox 100-Gb/s EDR InfiniBand ConnectX-5 adapters and Switch-IB2 switches, with compute-rack packaging and cooling contributions from IBM. These devices support in-network computing acceleration engines such as Mellanox Scalable Hierarchical Aggregation and Reduction Protocol, graphics processor unit (GPU) Direct RDMA, advanced adaptive routing, Quality of Service, and other network and application acceleration. The overall IBM Spectrum Message Passing Interface (MPI) messaging software stack implements Open MPI, and was a collaboration between IBM, Mellanox, and NVIDIA to optimize direct communication between endpoints, whether compute nodes (with IBM POWER CPUs, NVIDIA GPUs, and flash memory devices), or POWER-hosted storage nodes. The Fat-tree network can isolate traffic among the compute partitions and to/from the storage subsystem, providing more predictable application performance. In addition, the high level of redundancy of this network and its reconfiguration capability ensures reliable high performance even after network component failures. This article details the hardware and software architecture and performance of the networks and describes a number of the high-performance computing (HPC) enhancements engineered into this generation of InfiniBand.
橡树岭国家实验室的Summit超级计算机和劳伦斯利弗莫尔国家实验室的Sierra超级计算机在Fat-tree网络拓扑中利用InfiniBand互连,将所有计算节点、存储节点、管理节点互连到一个线性可扩展的网络中。这些网络基于Mellanox 100 gb /s EDR InfiniBand ConnectX-5适配器和Switch-IB2交换机,采用IBM提供的计算机架封装和冷却技术。这些设备支持网络内计算加速引擎,如Mellanox Scalable Hierarchical Aggregation and Reduction Protocol、GPU (graphics processor unit) Direct RDMA、高级自适应路由、Quality of Service等网络和应用加速。整个IBM Spectrum Message Passing Interface (MPI)消息传递软件栈实现了Open MPI,是IBM、Mellanox和NVIDIA之间的合作,用于优化端点之间的直接通信,无论是计算节点(带有IBM POWER cpu、NVIDIA gpu和闪存设备),还是POWER托管的存储节点。Fat-tree网络可以隔离计算分区之间和存储子系统之间的流量,从而提供更可预测的应用程序性能。此外,该网络的高水平冗余及其重新配置能力确保了即使在网络组件发生故障后也能提供可靠的高性能。本文详细介绍了网络的硬件和软件体系结构和性能,并描述了为这一代InfiniBand设计的许多高性能计算(HPC)增强。
{"title":"The high-speed networks of the Summit and Sierra supercomputers","authors":"C. B. Stunkel;R. L. Graham;G. Shainer;M. Kagan;S. S. Sharkawi;B. Rosenburg;G. A. Chochia","doi":"10.1147/JRD.2020.2967330","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967330","url":null,"abstract":"Oak Ridge National Laboratory's Summit supercomputer and Lawrence Livermore National Laboratory's Sierra supercomputer utilize InfiniBand interconnect in a Fat-tree network topology, interconnecting all compute nodes, storage nodes, administration, and management nodes into one linearly scalable network. These networks are based on Mellanox 100-Gb/s EDR InfiniBand ConnectX-5 adapters and Switch-IB2 switches, with compute-rack packaging and cooling contributions from IBM. These devices support in-network computing acceleration engines such as Mellanox Scalable Hierarchical Aggregation and Reduction Protocol, graphics processor unit (GPU) Direct RDMA, advanced adaptive routing, Quality of Service, and other network and application acceleration. The overall IBM Spectrum Message Passing Interface (MPI) messaging software stack implements Open MPI, and was a collaboration between IBM, Mellanox, and NVIDIA to optimize direct communication between endpoints, whether compute nodes (with IBM POWER CPUs, NVIDIA GPUs, and flash memory devices), or POWER-hosted storage nodes. The Fat-tree network can isolate traffic among the compute partitions and to/from the storage subsystem, providing more predictable application performance. In addition, the high level of redundancy of this network and its reconfiguration capability ensures reliable high performance even after network component failures. This article details the hardware and software architecture and performance of the networks and describes a number of the high-performance computing (HPC) enhancements engineered into this generation of InfiniBand.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49978544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Concurrent installation and acceptance of Summit and Sierra supercomputers Summit和Sierra超级计算机的同时安装和验收
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-01-16 DOI: 10.1147/JRD.2020.2967270
T. Liebsch
The deployment of any high-performance computer systems typically includes an acceptance process to validate the system's specifications, covering hardware, software, and delivered services. In this article, we describe the efforts undertaken by IBM and its partners to accomplish early preparations and then concurrently deliver, stabilize, and accept the two fastest supercomputers in the world at the time of deployment.
任何高性能计算机系统的部署通常包括验证系统规格的验收过程,包括硬件、软件和交付的服务。在本文中,我们将描述IBM及其合作伙伴为完成早期准备工作所做的努力,然后在部署时同时交付、稳定和接受世界上两台最快的超级计算机。
{"title":"Concurrent installation and acceptance of Summit and Sierra supercomputers","authors":"T. Liebsch","doi":"10.1147/JRD.2020.2967270","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967270","url":null,"abstract":"The deployment of any high-performance computer systems typically includes an acceptance process to validate the system's specifications, covering hardware, software, and delivered services. In this article, we describe the efforts undertaken by IBM and its partners to accomplish early preparations and then concurrently deliver, stabilize, and accept the two fastest supercomputers in the world at the time of deployment.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cluster system management 集群系统管理
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-01-16 DOI: 10.1147/JRD.2020.2967309
N. Besaw;L. Scheidenbach;J. Dunham;S. Kaur;A. Ohmacht;F. Pizzano;Y. Park
Cluster system management (CSM) was co-designed with the Department of Energy Labs to provide the support necessary to effectively manage the Summit and Sierra supercomputers. The CSM system administration tools provide a unified view of a large-scale cluster and the ability to examine and understand data from multiple sources. CSM consists of five components: 1) application programming interfaces (APIs) and infrastructure; 2) Big Data Store; 3) support for reliability, availability, and serviceability (RAS); 4) Diagnostic and Health Check; and 5) support for job management. APIs and infrastructure provide lightweight daemons for compute nodes, hardware and software inventory collection, job accounting, and RAS. Logs, environmental data, and performance data are collected in the Big Data Store for analysis. RAS events can trigger corrective actions by CSM. Diagnostic and Health Check are provided through a diagnostic framework and test results collection. To support job management, CSM coordinates with the Job Step Manager to provide an overlay network of JSM daemons. CSM is an open source and available at https://github.com/IBM/CAST. Documentation can be found at https://cast.readthedocs.io.
集群系统管理(CSM)与能源部实验室共同设计,为有效管理Summit和Sierra超级计算机提供必要的支持。CSM系统管理工具提供大规模集群的统一视图,以及检查和理解来自多个来源的数据的能力。CSM由五个部分组成:1)应用程序编程接口(api)和基础设施;2)大数据存储;3)对可靠性、可用性和可服务性(RAS)的支持;4)诊断和健康检查;5)支持作业管理。api和基础设施为计算节点、硬件和软件库存收集、作业记帐和RAS提供轻量级守护进程。日志数据、环境数据和性能数据被收集到大数据存储中进行分析。RAS事件可以触发CSM的纠正措施。诊断和运行状况检查通过诊断框架和测试结果集合提供。为了支持作业管理,CSM与作业步骤管理器协调,以提供JSM守护进程的覆盖网络。CSM是开源的,可以在https://github.com/IBM/CAST上获得。文档可以在https://cast.readthedocs.io上找到。
{"title":"Cluster system management","authors":"N. Besaw;L. Scheidenbach;J. Dunham;S. Kaur;A. Ohmacht;F. Pizzano;Y. Park","doi":"10.1147/JRD.2020.2967309","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967309","url":null,"abstract":"Cluster system management (CSM) was co-designed with the Department of Energy Labs to provide the support necessary to effectively manage the Summit and Sierra supercomputers. The CSM system administration tools provide a unified view of a large-scale cluster and the ability to examine and understand data from multiple sources. CSM consists of five components: 1) application programming interfaces (APIs) and infrastructure; 2) Big Data Store; 3) support for reliability, availability, and serviceability (RAS); 4) Diagnostic and Health Check; and 5) support for job management. APIs and infrastructure provide lightweight daemons for compute nodes, hardware and software inventory collection, job accounting, and RAS. Logs, environmental data, and performance data are collected in the Big Data Store for analysis. RAS events can trigger corrective actions by CSM. Diagnostic and Health Check are provided through a diagnostic framework and test results collection. To support job management, CSM coordinates with the Job Step Manager to provide an overlay network of JSM daemons. CSM is an open source and available at \u0000<uri>https://github.com/IBM/CAST</uri>\u0000. Documentation can be found at \u0000<uri>https://cast.readthedocs.io</uri>\u0000.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Pre-exascale accelerated application development: The ORNL Summit experience Pre-exascale加速应用程序开发:ORNL峰会经验
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2020-01-15 DOI: 10.1147/JRD.2020.2965881
L. Luo;T. P. Straatsma;L. E. Aguilar Suarez;R. Broer;D. Bykov;E. F. D'Azevedo;S. S. Faraji;K. C. Gottiparthi;C. De Graaf;J. A. Harris;R. W. A. Havenith;H. J. Aa. Jensen;W. Joubert;R. K. Kathir;J. Larkin;Y. W. Li;D. I. Lyakh;O. E. B. Messer;M. R. Norman;J. C. Oefelein;R. Sankaran;A. F. Tillack;A. L. Barnes;L. Visscher;J. C. Wells;M. Wibowo
High-performance computing (HPC) increasingly relies on heterogeneous architectures to achieve higher performance. In the Oak Ridge Leadership Facility (OLCF), Oak Ridge, TN, USA, this trend continues as its latest supercomputer, Summit, entered production in early 2019. The combination of IBM POWER9 CPU and NVIDIA V100 GPU, along with a fast NVLink2 interconnect and other latest technologies, pushes system performance to a new height and breaks the exascale barrier by certain measures. Due to Summit's powerful GPUs and much higher GPU–CPU ratio, offloading to accelerators becomes a requirement for any application, which intends to effectively use the system. To facilitate navigating a complex landscape of competing heterogeneous architectures, a collection of applications from a wide spectrum of scientific domains is selected for early adoption on Summit. In this article, the experience and lessons learned are summarized, in the hope of providing useful guidance to address new programming challenges, such as scalability, performance portability, and software maintainability, for future application development efforts on heterogeneous HPC systems.
高性能计算(HPC)越来越依赖于异构体系结构来实现更高的性能。在美国田纳西州橡树岭的橡树岭领导设施(OLCF),随着其最新的超级计算机Summit于2019年初投入生产,这一趋势仍在继续。IBM POWER9 CPU和NVIDIA V100 GPU的结合,加上快速NVLink2互连和其他最新技术,将系统性能推向了一个新的高度,并在某些方面打破了EB级的障碍。由于Summit强大的GPU和更高的GPU-CPU比率,任何想要有效使用系统的应用程序都需要卸载到加速器。为了便于在竞争异构架构的复杂环境中导航,我们选择了一系列来自广泛科学领域的应用程序,以便在Summit上尽早采用。在本文中,总结了经验教训,希望为解决新的编程挑战(如可扩展性、性能可移植性和软件可维护性)提供有用的指导,以供未来在异构HPC系统上进行应用程序开发。
{"title":"Pre-exascale accelerated application development: The ORNL Summit experience","authors":"L. Luo;T. P. Straatsma;L. E. Aguilar Suarez;R. Broer;D. Bykov;E. F. D'Azevedo;S. S. Faraji;K. C. Gottiparthi;C. De Graaf;J. A. Harris;R. W. A. Havenith;H. J. Aa. Jensen;W. Joubert;R. K. Kathir;J. Larkin;Y. W. Li;D. I. Lyakh;O. E. B. Messer;M. R. Norman;J. C. Oefelein;R. Sankaran;A. F. Tillack;A. L. Barnes;L. Visscher;J. C. Wells;M. Wibowo","doi":"10.1147/JRD.2020.2965881","DOIUrl":"https://doi.org/10.1147/JRD.2020.2965881","url":null,"abstract":"High-performance computing (HPC) increasingly relies on heterogeneous architectures to achieve higher performance. In the Oak Ridge Leadership Facility (OLCF), Oak Ridge, TN, USA, this trend continues as its latest supercomputer, Summit, entered production in early 2019. The combination of IBM POWER9 CPU and NVIDIA V100 GPU, along with a fast NVLink2 interconnect and other latest technologies, pushes system performance to a new height and breaks the exascale barrier by certain measures. Due to Summit's powerful GPUs and much higher GPU–CPU ratio, offloading to accelerators becomes a requirement for any application, which intends to effectively use the system. To facilitate navigating a complex landscape of competing heterogeneous architectures, a collection of applications from a wide spectrum of scientific domains is selected for early adoption on Summit. In this article, the experience and lessons learned are summarized, in the hope of providing useful guidance to address new programming challenges, such as scalability, performance portability, and software maintainability, for future application development efforts on heterogeneous HPC systems.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2965881","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
IBM Journal of Research and Development
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1