Pub Date : 2020-06-30DOI: 10.35429/jrd.2021.19.7.22.30
R. Fornés-Rivera, Marco Antonio Conant-Pablos, Adolfo Cano-Carrasco, Gildardo Guadalupe López-Rojo
This research was developed in a company that manufactures frames and moldings in the production and quality area and addresses the need to implement improvement actions due to rework and low production in the patching workstation, derived from flaws such as poor patching, bump, bubble and porosity in the products. Currently there is a production record of 1.75% and rework of 19.25% in the first hours of the working day. The objective was to implement improvement actions, through the 8D's methodology, to reduce rework and increase production. The procedure implied forming a team; defining the problem; implementing containment actions; identifying and verifying the root cause; determining permanent corrective actions; identifying and implementing permanent corrective actions; preventing the recurrence of the problem and/or root cause, and acknowledging the effort of the team. It contributed with the increase in production and reduction of rework in the patching workstation, thus fulfilling the objective of this research.
{"title":"Implementation of improvement actions in a company that produces frames and moldings","authors":"R. Fornés-Rivera, Marco Antonio Conant-Pablos, Adolfo Cano-Carrasco, Gildardo Guadalupe López-Rojo","doi":"10.35429/jrd.2021.19.7.22.30","DOIUrl":"https://doi.org/10.35429/jrd.2021.19.7.22.30","url":null,"abstract":"This research was developed in a company that manufactures frames and moldings in the production and quality area and addresses the need to implement improvement actions due to rework and low production in the patching workstation, derived from flaws such as poor patching, bump, bubble and porosity in the products. Currently there is a production record of 1.75% and rework of 19.25% in the first hours of the working day. The objective was to implement improvement actions, through the 8D's methodology, to reduce rework and increase production. The procedure implied forming a team; defining the problem; implementing containment actions; identifying and verifying the root cause; determining permanent corrective actions; identifying and implementing permanent corrective actions; preventing the recurrence of the problem and/or root cause, and acknowledging the effort of the team. It contributed with the increase in production and reduction of rework in the patching workstation, thus fulfilling the objective of this research.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"192 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73754888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-06-30DOI: 10.35429/jrd.2020.17.6.15.20
Jaime Jalomo, Edith Preciado, Jorge Gudiño
Biomedical signals are current case of Avant-garde study, thanks to advances in artificial intelligence, every day new methods are implemented that are useful for the treatment of this signals, mainly to detect anomalies or diseases with greater precision. A solution on base of the Deep Learning is proposed, this technology has proven to be efficient in handling high-level feature data, in it featured neural networks convolutionals (NNC) which are ideal in image management. In this paper, electrocardiographic signals (ECG) designed from a dynamic mathematical model in a two convolution layer NNC for classification are used.
{"title":"Design of a convolutional neural network for classification of biomedical signals","authors":"Jaime Jalomo, Edith Preciado, Jorge Gudiño","doi":"10.35429/jrd.2020.17.6.15.20","DOIUrl":"https://doi.org/10.35429/jrd.2020.17.6.15.20","url":null,"abstract":"Biomedical signals are current case of Avant-garde study, thanks to advances in artificial intelligence, every day new methods are implemented that are useful for the treatment of this signals, mainly to detect anomalies or diseases with greater precision. A solution on base of the Deep Learning is proposed, this technology has proven to be efficient in handling high-level feature data, in it featured neural networks convolutionals (NNC) which are ideal in image management. In this paper, electrocardiographic signals (ECG) designed from a dynamic mathematical model in a two convolution layer NNC for classification are used.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"39 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85361534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.35429/jrd.2019.17.6.1.8
M. G. ROSALES-SOSA, Manuel Garcia-Yregoi, Blanca Idalia Rosales-Sosa, R. Servin-Castañeda
Samples of barite mineral ore, were ground to a mesh of 250, and then were subjected to a leaching stage with hydrochloric acid at different times, then; the leached barite mineral ore was subjected to a carbonation stage controlling different parameters such as pH, temperature, time and speed of agitation. Finally, it was subjected to a sintering stage with the Fe2O3 precursor obtained from the waste powder of the steelmaking company’s rolling process, in a temperature range of 1000 and 1200 ° C, for 12 and 24 times. The materials obtained are characterized by infrared spectroscopy (IR Spectroscopy), X-ray Diffraction (XRD) and Scanning Electron Microscopy (SEM).
{"title":"Synthesis of barium ferrite, using barite mineral ore and a metallurgical waste","authors":"M. G. ROSALES-SOSA, Manuel Garcia-Yregoi, Blanca Idalia Rosales-Sosa, R. Servin-Castañeda","doi":"10.35429/jrd.2019.17.6.1.8","DOIUrl":"https://doi.org/10.35429/jrd.2019.17.6.1.8","url":null,"abstract":"Samples of barite mineral ore, were ground to a mesh of 250, and then were subjected to a leaching stage with hydrochloric acid at different times, then; the leached barite mineral ore was subjected to a carbonation stage controlling different parameters such as pH, temperature, time and speed of agitation. Finally, it was subjected to a sintering stage with the Fe2O3 precursor obtained from the waste powder of the steelmaking company’s rolling process, in a temperature range of 1000 and 1200 ° C, for 12 and 24 times. The materials obtained are characterized by infrared spectroscopy (IR Spectroscopy), X-ray Diffraction (XRD) and Scanning Electron Microscopy (SEM).","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"135 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80699487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-13DOI: 10.1147/JRD.2020.2976169
{"title":"Preface: Summit and Sierra Supercomputers","authors":"","doi":"10.1147/JRD.2020.2976169","DOIUrl":"10.1147/JRD.2020.2976169","url":null,"abstract":"","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"1-4"},"PeriodicalIF":1.3,"publicationDate":"2020-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2976169","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41632393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-04Epub Date: 2019-11-15DOI: 10.1016/j.ymthe.2019.11.011
Jamie L Shirley, Geoffrey D Keeler, Alexandra Sherman, Irene Zolotukhin, David M Markusic, Brad E Hoffman, Laurence M Morel, Mark A Wallet, Cox Terhorst, Roland W Herzog
Adeno-associated virus (AAV) vectors are widely used in clinical gene therapy to correct genetic disease by in vivo gene transfer. Although the vectors are useful, in part because of their limited immunogenicity, immune responses directed at vector components have complicated applications in humans. These include, for instance, innate immune sensing of vector components by plasmacytoid dendritic cells (pDCs), which sense the vector DNA genome via Toll-like receptor 9. Adaptive immune responses employ antigen presentation by conventional dendritic cells (cDCs), which leads to cross-priming of capsid-specific CD8+ T cells. In this study, we sought to determine the mechanisms that promote licensing of cDCs, which is requisite for CD8+ T cell activation. Blockage of type 1 interferon (T1 IFN) signaling by monoclonal antibody therapy prevented cross-priming. Furthermore, experiments in cell-type-restricted knockout mice showed a specific requirement for the receptor for T1 IFN (IFNaR) in cDCs. In contrast, natural killer (NK) cells are not needed, indicating a direct rather than indirect effect of T1 IFN on cDCs. In addition, co-stimulation by CD4+ T cells via CD40-CD40L was required for cross-priming, and blockage of co-stimulation but not of T1 IFN additionally reduced antibody formation against capsid. These mechanistic insights inform the development of targeted immune interventions.
腺相关病毒(AAV)载体被广泛应用于临床基因治疗,通过体内基因转移纠正遗传疾病。尽管这种载体非常有用,部分原因是其免疫原性有限,但针对载体成分的免疫反应使其在人体中的应用变得复杂。例如,这些反应包括质体树突状细胞(pDCs)对载体成分的先天性免疫感知,pDCs 通过 Toll 样受体 9 感知载体 DNA 基因组。适应性免疫反应采用传统树突状细胞(cDCs)进行抗原呈递,从而导致帽状体特异性 CD8+ T 细胞的交叉刺激。在这项研究中,我们试图确定促进 cDCs 许可的机制,这是 CD8+ T 细胞活化的必要条件。通过单克隆抗体疗法阻断1型干扰素(T1 IFN)信号传导可阻止交叉诱导。此外,在细胞类型受限的基因敲除小鼠中进行的实验表明,cDCs 对 T1 IFN(IFNaR)受体有特殊要求。相反,自然杀伤(NK)细胞则不需要,这表明 T1 IFN 对 cDC 有直接而非间接的影响。此外,交叉刺激需要 CD4+ T 细胞通过 CD40-CD40L 共同刺激,阻断共同刺激而非 T1 IFN 还能减少针对帽状体的抗体形成。这些机理启示为开发有针对性的免疫干预措施提供了信息。
{"title":"Type I IFN Sensing by cDCs and CD4<sup>+</sup> T Cell Help Are Both Requisite for Cross-Priming of AAV Capsid-Specific CD8<sup>+</sup> T Cells.","authors":"Jamie L Shirley, Geoffrey D Keeler, Alexandra Sherman, Irene Zolotukhin, David M Markusic, Brad E Hoffman, Laurence M Morel, Mark A Wallet, Cox Terhorst, Roland W Herzog","doi":"10.1016/j.ymthe.2019.11.011","DOIUrl":"10.1016/j.ymthe.2019.11.011","url":null,"abstract":"<p><p>Adeno-associated virus (AAV) vectors are widely used in clinical gene therapy to correct genetic disease by in vivo gene transfer. Although the vectors are useful, in part because of their limited immunogenicity, immune responses directed at vector components have complicated applications in humans. These include, for instance, innate immune sensing of vector components by plasmacytoid dendritic cells (pDCs), which sense the vector DNA genome via Toll-like receptor 9. Adaptive immune responses employ antigen presentation by conventional dendritic cells (cDCs), which leads to cross-priming of capsid-specific CD8<sup>+</sup> T cells. In this study, we sought to determine the mechanisms that promote licensing of cDCs, which is requisite for CD8<sup>+</sup> T cell activation. Blockage of type 1 interferon (T1 IFN) signaling by monoclonal antibody therapy prevented cross-priming. Furthermore, experiments in cell-type-restricted knockout mice showed a specific requirement for the receptor for T1 IFN (IFNaR) in cDCs. In contrast, natural killer (NK) cells are not needed, indicating a direct rather than indirect effect of T1 IFN on cDCs. In addition, co-stimulation by CD4<sup>+</sup> T cells via CD40-CD40L was required for cross-priming, and blockage of co-stimulation but not of T1 IFN additionally reduced antibody formation against capsid. These mechanistic insights inform the development of targeted immune interventions.</p>","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"34 1","pages":"758-770"},"PeriodicalIF":0.0,"publicationDate":"2020-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054715/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86051418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-31DOI: 10.1147/JRD.2020.2970903
C. M. Albrecht;B. Elmegreen;O. Gunawan;H. F. Hamann;L. J. Klein;S. Lu;F. Mariano;C. Siebenschuh;J. Schmude
Traditional geographic information systems (GIS) have been disrupted by the emergence of Big Data in the form of geo-coded raster, vector, and time-series Internet-of-Things data. This article discusses the application of new scalable technologies that go far beyond relational databases and file-based storage on spinning disk or tape to incorporate both storage and processing data in the same platform. The roles of the Apache Hadoop Distributed File Systems and NoSQL key-value stores such as the Apache Hbase are discussed, along with indexing schemes that optimally support geospatial-temporal use. We highlight how this new approach can rapidly search multiple GIS data layers to obtain insights in the context of early warning, impact evaluation, response, and recovery to earthquake and wildfire disasters.
{"title":"Next-generation geospatial-temporal information technologies for disaster management","authors":"C. M. Albrecht;B. Elmegreen;O. Gunawan;H. F. Hamann;L. J. Klein;S. Lu;F. Mariano;C. Siebenschuh;J. Schmude","doi":"10.1147/JRD.2020.2970903","DOIUrl":"https://doi.org/10.1147/JRD.2020.2970903","url":null,"abstract":"Traditional geographic information systems (GIS) have been disrupted by the emergence of Big Data in the form of geo-coded raster, vector, and time-series Internet-of-Things data. This article discusses the application of new scalable technologies that go far beyond relational databases and file-based storage on spinning disk or tape to incorporate both storage and processing data in the same platform. The roles of the Apache Hadoop Distributed File Systems and NoSQL key-value stores such as the Apache Hbase are discussed, along with indexing schemes that optimally support geospatial-temporal use. We highlight how this new approach can rapidly search multiple GIS data layers to obtain insights in the context of early warning, impact evaluation, response, and recovery to earthquake and wildfire disasters.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 1/2","pages":"5:1-5:12"},"PeriodicalIF":1.3,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2970903","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49980048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-16DOI: 10.1147/JRD.2020.2967311
S. S. Sharkawi;G. A. Chochia
The U.S. Department of Energy CORAL program systems SUMMIT and SIERRA are based on hybrid servers comprising IBM POWER9 CPUs and NVIDIA V100 graphics processing units (GPUs) connected by two extended data rate (EDR) links to a high-speed InfiniBand Network. A major challenge to the communication software stack is to optimize performance for all combinations of data origin and destination: host or GPU memory, same or different server. Alternate paths exist for routing data from GPU memory. When origin and destination are on different servers, it can be sent either via host memory or bypassing host memory with GPU direct feature. When origin and destination are on the same server, host memory can be bypassed with peer-to-peer inter process communication (IPC). For large messages pipelining makes host memory data path competitive with GPU direct. In this article, we explain the techniques used in Spectrum MPI Parallel Active Message Interface layer to cache memory types and attributes in order to reduce the overhead associated with calling the CUDA application programming interface (API); in addition, we detail the different protocols used for different memory types, device memory, managed memory, and host memory. To illustrate, the caching technique achieved a device-to-device latency improvement of 26% for intranode transfers and 19% for internode transfers.
{"title":"Communication protocol optimization for enhanced GPU performance","authors":"S. S. Sharkawi;G. A. Chochia","doi":"10.1147/JRD.2020.2967311","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967311","url":null,"abstract":"The U.S. Department of Energy CORAL program systems SUMMIT and SIERRA are based on hybrid servers comprising IBM POWER9 CPUs and NVIDIA V100 graphics processing units (GPUs) connected by two extended data rate (EDR) links to a high-speed InfiniBand Network. A major challenge to the communication software stack is to optimize performance for all combinations of data origin and destination: host or GPU memory, same or different server. Alternate paths exist for routing data from GPU memory. When origin and destination are on different servers, it can be sent either via host memory or bypassing host memory with GPU direct feature. When origin and destination are on the same server, host memory can be bypassed with peer-to-peer inter process communication (IPC). For large messages pipelining makes host memory data path competitive with GPU direct. In this article, we explain the techniques used in Spectrum MPI Parallel Active Message Interface layer to cache memory types and attributes in order to reduce the overhead associated with calling the CUDA application programming interface (API); in addition, we detail the different protocols used for different memory types, device memory, managed memory, and host memory. To illustrate, the caching technique achieved a device-to-device latency improvement of 26% for intranode transfers and 19% for internode transfers.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"9:1-9:9"},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967311","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-16DOI: 10.1147/JRD.2020.2967330
C. B. Stunkel;R. L. Graham;G. Shainer;M. Kagan;S. S. Sharkawi;B. Rosenburg;G. A. Chochia
Oak Ridge National Laboratory's Summit supercomputer and Lawrence Livermore National Laboratory's Sierra supercomputer utilize InfiniBand interconnect in a Fat-tree network topology, interconnecting all compute nodes, storage nodes, administration, and management nodes into one linearly scalable network. These networks are based on Mellanox 100-Gb/s EDR InfiniBand ConnectX-5 adapters and Switch-IB2 switches, with compute-rack packaging and cooling contributions from IBM. These devices support in-network computing acceleration engines such as Mellanox Scalable Hierarchical Aggregation and Reduction Protocol, graphics processor unit (GPU) Direct RDMA, advanced adaptive routing, Quality of Service, and other network and application acceleration. The overall IBM Spectrum Message Passing Interface (MPI) messaging software stack implements Open MPI, and was a collaboration between IBM, Mellanox, and NVIDIA to optimize direct communication between endpoints, whether compute nodes (with IBM POWER CPUs, NVIDIA GPUs, and flash memory devices), or POWER-hosted storage nodes. The Fat-tree network can isolate traffic among the compute partitions and to/from the storage subsystem, providing more predictable application performance. In addition, the high level of redundancy of this network and its reconfiguration capability ensures reliable high performance even after network component failures. This article details the hardware and software architecture and performance of the networks and describes a number of the high-performance computing (HPC) enhancements engineered into this generation of InfiniBand.
橡树岭国家实验室的Summit超级计算机和劳伦斯利弗莫尔国家实验室的Sierra超级计算机在Fat-tree网络拓扑中利用InfiniBand互连,将所有计算节点、存储节点、管理节点互连到一个线性可扩展的网络中。这些网络基于Mellanox 100 gb /s EDR InfiniBand ConnectX-5适配器和Switch-IB2交换机,采用IBM提供的计算机架封装和冷却技术。这些设备支持网络内计算加速引擎,如Mellanox Scalable Hierarchical Aggregation and Reduction Protocol、GPU (graphics processor unit) Direct RDMA、高级自适应路由、Quality of Service等网络和应用加速。整个IBM Spectrum Message Passing Interface (MPI)消息传递软件栈实现了Open MPI,是IBM、Mellanox和NVIDIA之间的合作,用于优化端点之间的直接通信,无论是计算节点(带有IBM POWER cpu、NVIDIA gpu和闪存设备),还是POWER托管的存储节点。Fat-tree网络可以隔离计算分区之间和存储子系统之间的流量,从而提供更可预测的应用程序性能。此外,该网络的高水平冗余及其重新配置能力确保了即使在网络组件发生故障后也能提供可靠的高性能。本文详细介绍了网络的硬件和软件体系结构和性能,并描述了为这一代InfiniBand设计的许多高性能计算(HPC)增强。
{"title":"The high-speed networks of the Summit and Sierra supercomputers","authors":"C. B. Stunkel;R. L. Graham;G. Shainer;M. Kagan;S. S. Sharkawi;B. Rosenburg;G. A. Chochia","doi":"10.1147/JRD.2020.2967330","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967330","url":null,"abstract":"Oak Ridge National Laboratory's Summit supercomputer and Lawrence Livermore National Laboratory's Sierra supercomputer utilize InfiniBand interconnect in a Fat-tree network topology, interconnecting all compute nodes, storage nodes, administration, and management nodes into one linearly scalable network. These networks are based on Mellanox 100-Gb/s EDR InfiniBand ConnectX-5 adapters and Switch-IB2 switches, with compute-rack packaging and cooling contributions from IBM. These devices support in-network computing acceleration engines such as Mellanox Scalable Hierarchical Aggregation and Reduction Protocol, graphics processor unit (GPU) Direct RDMA, advanced adaptive routing, Quality of Service, and other network and application acceleration. The overall IBM Spectrum Message Passing Interface (MPI) messaging software stack implements Open MPI, and was a collaboration between IBM, Mellanox, and NVIDIA to optimize direct communication between endpoints, whether compute nodes (with IBM POWER CPUs, NVIDIA GPUs, and flash memory devices), or POWER-hosted storage nodes. The Fat-tree network can isolate traffic among the compute partitions and to/from the storage subsystem, providing more predictable application performance. In addition, the high level of redundancy of this network and its reconfiguration capability ensures reliable high performance even after network component failures. This article details the hardware and software architecture and performance of the networks and describes a number of the high-performance computing (HPC) enhancements engineered into this generation of InfiniBand.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"3:1-3:10"},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49978544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-16DOI: 10.1147/JRD.2020.2967270
T. Liebsch
The deployment of any high-performance computer systems typically includes an acceptance process to validate the system's specifications, covering hardware, software, and delivered services. In this article, we describe the efforts undertaken by IBM and its partners to accomplish early preparations and then concurrently deliver, stabilize, and accept the two fastest supercomputers in the world at the time of deployment.
{"title":"Concurrent installation and acceptance of Summit and Sierra supercomputers","authors":"T. Liebsch","doi":"10.1147/JRD.2020.2967270","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967270","url":null,"abstract":"The deployment of any high-performance computer systems typically includes an acceptance process to validate the system's specifications, covering hardware, software, and delivered services. In this article, we describe the efforts undertaken by IBM and its partners to accomplish early preparations and then concurrently deliver, stabilize, and accept the two fastest supercomputers in the world at the time of deployment.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"6:1-6:8"},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-16DOI: 10.1147/JRD.2020.2967309
N. Besaw;L. Scheidenbach;J. Dunham;S. Kaur;A. Ohmacht;F. Pizzano;Y. Park
Cluster system management (CSM) was co-designed with the Department of Energy Labs to provide the support necessary to effectively manage the Summit and Sierra supercomputers. The CSM system administration tools provide a unified view of a large-scale cluster and the ability to examine and understand data from multiple sources. CSM consists of five components: 1) application programming interfaces (APIs) and infrastructure; 2) Big Data Store; 3) support for reliability, availability, and serviceability (RAS); 4) Diagnostic and Health Check; and 5) support for job management. APIs and infrastructure provide lightweight daemons for compute nodes, hardware and software inventory collection, job accounting, and RAS. Logs, environmental data, and performance data are collected in the Big Data Store for analysis. RAS events can trigger corrective actions by CSM. Diagnostic and Health Check are provided through a diagnostic framework and test results collection. To support job management, CSM coordinates with the Job Step Manager to provide an overlay network of JSM daemons. CSM is an open source and available at https://github.com/IBM/CAST