IBM Journal of Research and Development最新文献

英文中文

Implementation of improvement actions in a company that produces frames and moldings 在生产框架和模具的公司实施改进行动

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-06-30 DOI: 10.35429/jrd.2021.19.7.22.30

R. Fornés-Rivera, Marco Antonio Conant-Pablos, Adolfo Cano-Carrasco, Gildardo Guadalupe López-Rojo

This research was developed in a company that manufactures frames and moldings in the production and quality area and addresses the need to implement improvement actions due to rework and low production in the patching workstation, derived from flaws such as poor patching, bump, bubble and porosity in the products. Currently there is a production record of 1.75% and rework of 19.25% in the first hours of the working day. The objective was to implement improvement actions, through the 8D's methodology, to reduce rework and increase production. The procedure implied forming a team; defining the problem; implementing containment actions; identifying and verifying the root cause; determining permanent corrective actions; identifying and implementing permanent corrective actions; preventing the recurrence of the problem and/or root cause, and acknowledging the effort of the team. It contributed with the increase in production and reduction of rework in the patching workstation, thus fulfilling the objective of this research.

这项研究是在一家生产框架和模具的生产和质量领域的公司进行的，并解决了由于修补工作站的返工和低产量而导致的改进行动的需要，这些缺陷源于产品的修补不良、凹凸、气泡和孔隙。目前工作日头几个小时的生产记录为1.75%，返工率为19.25%。目标是实施改进行动，通过8D的方法，减少返工，提高产量。这个过程意味着组建一个团队;定义问题;实施遏制行动;查明和核实根本原因;确定永久性纠正措施;确定并实施永久性纠正措施;防止问题和/或根本原因的再次发生，并认可团队的努力。提高了补片工作站的生产效率，减少了返工，达到了本研究的目的。

引用次数: 0

Design of a convolutional neural network for classification of biomedical signals 用于生物医学信号分类的卷积神经网络设计

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-06-30 DOI: 10.35429/jrd.2020.17.6.15.20

Jaime Jalomo, Edith Preciado, Jorge Gudiño

Biomedical signals are current case of Avant-garde study, thanks to advances in artificial intelligence, every day new methods are implemented that are useful for the treatment of this signals, mainly to detect anomalies or diseases with greater precision. A solution on base of the Deep Learning is proposed, this technology has proven to be efficient in handling high-level feature data, in it featured neural networks convolutionals (NNC) which are ideal in image management. In this paper, electrocardiographic signals (ECG) designed from a dynamic mathematical model in a two convolution layer NNC for classification are used.

生物医学信号是前卫研究的当前案例，由于人工智能的进步，每天都有新的方法被实施，这些方法对治疗这些信号有用，主要是为了更精确地检测异常或疾病。提出了一种基于深度学习的解决方案，该技术已被证明在处理高级特征数据方面是有效的，其中特征神经网络卷积(NNC)是图像管理的理想选择。本文利用动态数学模型设计的心电信号在两层卷积神经网络中进行分类。

引用次数: 0

Synthesis of barium ferrite, using barite mineral ore and a metallurgical waste 利用重晶石矿石和某冶金废料合成钡铁氧体

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-05-01 DOI: 10.35429/jrd.2019.17.6.1.8

M. G. ROSALES-SOSA, Manuel Garcia-Yregoi, Blanca Idalia Rosales-Sosa, R. Servin-Castañeda

Samples of barite mineral ore, were ground to a mesh of 250, and then were subjected to a leaching stage with hydrochloric acid at different times, then; the leached barite mineral ore was subjected to a carbonation stage controlling different parameters such as pH, temperature, time and speed of agitation. Finally, it was subjected to a sintering stage with the Fe2O3 precursor obtained from the waste powder of the steelmaking company’s rolling process, in a temperature range of 1000 and 1200 ° C, for 12 and 24 times. The materials obtained are characterized by infrared spectroscopy (IR Spectroscopy), X-ray Diffraction (XRD) and Scanning Electron Microscopy (SEM).

将重晶石矿物矿石的样品磨成250目，然后在不同的时间用盐酸浸出阶段，然后;通过控制pH、温度、搅拌时间和搅拌速度等参数，对浸出的重晶石矿石进行碳酸化处理。最后，用炼钢公司轧制过程的废粉中获得的Fe2O3前驱体，在1000℃和1200℃的温度范围内烧结12次和24次。采用红外光谱(IR)、x射线衍射(XRD)和扫描电镜(SEM)对所得材料进行了表征。

引用次数: 0

Preface: Summit and Sierra Supercomputers 前言：Summit和Sierra超级计算机

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-03-13 DOI: 10.1147/JRD.2020.2976169

引用次数: 1

Type I IFN Sensing by cDCs and CD4⁺ T Cell Help Are Both Requisite for Cross-Priming of AAV Capsid-Specific CD8⁺ T Cells. cDC 对 I 型 IFN 的感应和 CD4+ T 细胞的帮助都是 AAV 信囊特异性 CD8+ T 细胞交叉修饰的必要条件。

4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-03-04 Epub Date: 2019-11-15 DOI: 10.1016/j.ymthe.2019.11.011

Jamie L Shirley, Geoffrey D Keeler, Alexandra Sherman, Irene Zolotukhin, David M Markusic, Brad E Hoffman, Laurence M Morel, Mark A Wallet, Cox Terhorst, Roland W Herzog

Adeno-associated virus (AAV) vectors are widely used in clinical gene therapy to correct genetic disease by in vivo gene transfer. Although the vectors are useful, in part because of their limited immunogenicity, immune responses directed at vector components have complicated applications in humans. These include, for instance, innate immune sensing of vector components by plasmacytoid dendritic cells (pDCs), which sense the vector DNA genome via Toll-like receptor 9. Adaptive immune responses employ antigen presentation by conventional dendritic cells (cDCs), which leads to cross-priming of capsid-specific CD8⁺ T cells. In this study, we sought to determine the mechanisms that promote licensing of cDCs, which is requisite for CD8⁺ T cell activation. Blockage of type 1 interferon (T1 IFN) signaling by monoclonal antibody therapy prevented cross-priming. Furthermore, experiments in cell-type-restricted knockout mice showed a specific requirement for the receptor for T1 IFN (IFNaR) in cDCs. In contrast, natural killer (NK) cells are not needed, indicating a direct rather than indirect effect of T1 IFN on cDCs. In addition, co-stimulation by CD4⁺ T cells via CD40-CD40L was required for cross-priming, and blockage of co-stimulation but not of T1 IFN additionally reduced antibody formation against capsid. These mechanistic insights inform the development of targeted immune interventions.

腺相关病毒（AAV）载体被广泛应用于临床基因治疗，通过体内基因转移纠正遗传疾病。尽管这种载体非常有用，部分原因是其免疫原性有限，但针对载体成分的免疫反应使其在人体中的应用变得复杂。例如，这些反应包括质体树突状细胞（pDCs）对载体成分的先天性免疫感知，pDCs 通过 Toll 样受体 9 感知载体 DNA 基因组。适应性免疫反应采用传统树突状细胞（cDCs）进行抗原呈递，从而导致帽状体特异性 CD8+ T 细胞的交叉刺激。在这项研究中，我们试图确定促进 cDCs 许可的机制，这是 CD8+ T 细胞活化的必要条件。通过单克隆抗体疗法阻断1型干扰素（T1 IFN）信号传导可阻止交叉诱导。此外，在细胞类型受限的基因敲除小鼠中进行的实验表明，cDCs 对 T1 IFN（IFNaR）受体有特殊要求。相反，自然杀伤（NK）细胞则不需要，这表明 T1 IFN 对 cDC 有直接而非间接的影响。此外，交叉刺激需要 CD4+ T 细胞通过 CD40-CD40L 共同刺激，阻断共同刺激而非 T1 IFN 还能减少针对帽状体的抗体形成。这些机理启示为开发有针对性的免疫干预措施提供了信息。

{"title":"Type I IFN Sensing by cDCs and CD4+ T Cell Help Are Both Requisite for Cross-Priming of AAV Capsid-Specific CD8+ T Cells.","authors":"Jamie L Shirley, Geoffrey D Keeler, Alexandra Sherman, Irene Zolotukhin, David M Markusic, Brad E Hoffman, Laurence M Morel, Mark A Wallet, Cox Terhorst, Roland W Herzog","doi":"10.1016/j.ymthe.2019.11.011","DOIUrl":"10.1016/j.ymthe.2019.11.011","url":null,"abstract":"Adeno-associated virus (AAV) vectors are widely used in clinical gene therapy to correct genetic disease by in vivo gene transfer. Although the vectors are useful, in part because of their limited immunogenicity, immune responses directed at vector components have complicated applications in humans. These include, for instance, innate immune sensing of vector components by plasmacytoid dendritic cells (pDCs), which sense the vector DNA genome via Toll-like receptor 9. Adaptive immune responses employ antigen presentation by conventional dendritic cells (cDCs), which leads to cross-priming of capsid-specific CD8+ T cells. In this study, we sought to determine the mechanisms that promote licensing of cDCs, which is requisite for CD8+ T cell activation. Blockage of type 1 interferon (T1 IFN) signaling by monoclonal antibody therapy prevented cross-priming. Furthermore, experiments in cell-type-restricted knockout mice showed a specific requirement for the receptor for T1 IFN (IFNaR) in cDCs. In contrast, natural killer (NK) cells are not needed, indicating a direct rather than indirect effect of T1 IFN on cDCs. In addition, co-stimulation by CD4+ T cells via CD40-CD40L was required for cross-priming, and blockage of co-stimulation but not of T1 IFN additionally reduced antibody formation against capsid. These mechanistic insights inform the development of targeted immune interventions.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"34 1","pages":"758-770"},"PeriodicalIF":0.0,"publicationDate":"2020-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054715/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86051418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Next-generation geospatial-temporal information technologies for disaster management 用于灾害管理的下一代地理时空信息技术

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-01-31 DOI: 10.1147/JRD.2020.2970903

C. M. Albrecht;B. Elmegreen;O. Gunawan;H. F. Hamann;L. J. Klein;S. Lu;F. Mariano;C. Siebenschuh;J. Schmude

Traditional geographic information systems (GIS) have been disrupted by the emergence of Big Data in the form of geo-coded raster, vector, and time-series Internet-of-Things data. This article discusses the application of new scalable technologies that go far beyond relational databases and file-based storage on spinning disk or tape to incorporate both storage and processing data in the same platform. The roles of the Apache Hadoop Distributed File Systems and NoSQL key-value stores such as the Apache Hbase are discussed, along with indexing schemes that optimally support geospatial-temporal use. We highlight how this new approach can rapidly search multiple GIS data layers to obtain insights in the context of early warning, impact evaluation, response, and recovery to earthquake and wildfire disasters.

传统的地理信息系统（GIS）已经被地理编码光栅、矢量和时间序列物联网数据形式的大数据的出现所破坏。本文讨论了新的可扩展技术的应用，这些技术远远超出了关系数据库和旋转磁盘或磁带上基于文件的存储，将存储和处理数据合并到同一平台中。讨论了Apache Hadoop分布式文件系统和NoSQL键值存储（如Apache Hbase）的作用，以及最佳支持地理空间时间使用的索引方案。我们强调了这种新方法如何快速搜索多个GIS数据层，以在地震和野火灾害的预警、影响评估、响应和恢复方面获得见解。

引用次数: 8

Communication protocol optimization for enhanced GPU performance 通信协议优化，增强GPU性能

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-01-16 DOI: 10.1147/JRD.2020.2967311

S. S. Sharkawi;G. A. Chochia

The U.S. Department of Energy CORAL program systems SUMMIT and SIERRA are based on hybrid servers comprising IBM POWER9 CPUs and NVIDIA V100 graphics processing units (GPUs) connected by two extended data rate (EDR) links to a high-speed InfiniBand Network. A major challenge to the communication software stack is to optimize performance for all combinations of data origin and destination: host or GPU memory, same or different server. Alternate paths exist for routing data from GPU memory. When origin and destination are on different servers, it can be sent either via host memory or bypassing host memory with GPU direct feature. When origin and destination are on the same server, host memory can be bypassed with peer-to-peer inter process communication (IPC). For large messages pipelining makes host memory data path competitive with GPU direct. In this article, we explain the techniques used in Spectrum MPI Parallel Active Message Interface layer to cache memory types and attributes in order to reduce the overhead associated with calling the CUDA application programming interface (API); in addition, we detail the different protocols used for different memory types, device memory, managed memory, and host memory. To illustrate, the caching technique achieved a device-to-device latency improvement of 26% for intranode transfers and 19% for internode transfers.

美国能源部CORAL项目系统SUMMIT和SIERRA基于混合服务器，包括IBM POWER9 cpu和NVIDIA V100图形处理单元(gpu)，通过两条扩展数据速率(EDR)链路连接到高速InfiniBand网络。通信软件栈面临的一个主要挑战是优化所有数据源和目的地组合的性能:主机或GPU内存，相同或不同的服务器。存在从GPU内存路由数据的替代路径。当原点和目的地在不同的服务器上时，它可以通过主机内存发送，也可以绕过具有GPU直接功能的主机内存发送。当源和目标位于同一台服务器上时，可以通过点对点进程间通信(IPC)绕过主机内存。对于大型消息，流水线使得主机内存数据路径与GPU直接竞争。在本文中，我们解释了在Spectrum MPI并行活动消息接口层中使用的技术来缓存内存类型和属性，以减少与调用CUDA应用程序编程接口(API)相关的开销;此外，我们还详细介绍了用于不同内存类型、设备内存、托管内存和主机内存的不同协议。为了说明这一点，缓存技术实现了设备到设备延迟的改进，对于内部节点传输提高了26%，对于节点间传输提高了19%。

{"title":"Communication protocol optimization for enhanced GPU performance","authors":"S. S. Sharkawi;G. A. Chochia","doi":"10.1147/JRD.2020.2967311","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967311","url":null,"abstract":"The U.S. Department of Energy CORAL program systems SUMMIT and SIERRA are based on hybrid servers comprising IBM POWER9 CPUs and NVIDIA V100 graphics processing units (GPUs) connected by two extended data rate (EDR) links to a high-speed InfiniBand Network. A major challenge to the communication software stack is to optimize performance for all combinations of data origin and destination: host or GPU memory, same or different server. Alternate paths exist for routing data from GPU memory. When origin and destination are on different servers, it can be sent either via host memory or bypassing host memory with GPU direct feature. When origin and destination are on the same server, host memory can be bypassed with peer-to-peer inter process communication (IPC). For large messages pipelining makes host memory data path competitive with GPU direct. In this article, we explain the techniques used in Spectrum MPI Parallel Active Message Interface layer to cache memory types and attributes in order to reduce the overhead associated with calling the CUDA application programming interface (API); in addition, we detail the different protocols used for different memory types, device memory, managed memory, and host memory. To illustrate, the caching technique achieved a device-to-device latency improvement of 26% for intranode transfers and 19% for internode transfers.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"9:1-9:9"},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967311","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

The high-speed networks of the Summit and Sierra supercomputers Summit和Sierra超级计算机的高速网络

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-01-16 DOI: 10.1147/JRD.2020.2967330

C. B. Stunkel;R. L. Graham;G. Shainer;M. Kagan;S. S. Sharkawi;B. Rosenburg;G. A. Chochia

Oak Ridge National Laboratory's Summit supercomputer and Lawrence Livermore National Laboratory's Sierra supercomputer utilize InfiniBand interconnect in a Fat-tree network topology, interconnecting all compute nodes, storage nodes, administration, and management nodes into one linearly scalable network. These networks are based on Mellanox 100-Gb/s EDR InfiniBand ConnectX-5 adapters and Switch-IB2 switches, with compute-rack packaging and cooling contributions from IBM. These devices support in-network computing acceleration engines such as Mellanox Scalable Hierarchical Aggregation and Reduction Protocol, graphics processor unit (GPU) Direct RDMA, advanced adaptive routing, Quality of Service, and other network and application acceleration. The overall IBM Spectrum Message Passing Interface (MPI) messaging software stack implements Open MPI, and was a collaboration between IBM, Mellanox, and NVIDIA to optimize direct communication between endpoints, whether compute nodes (with IBM POWER CPUs, NVIDIA GPUs, and flash memory devices), or POWER-hosted storage nodes. The Fat-tree network can isolate traffic among the compute partitions and to/from the storage subsystem, providing more predictable application performance. In addition, the high level of redundancy of this network and its reconfiguration capability ensures reliable high performance even after network component failures. This article details the hardware and software architecture and performance of the networks and describes a number of the high-performance computing (HPC) enhancements engineered into this generation of InfiniBand.

橡树岭国家实验室的Summit超级计算机和劳伦斯利弗莫尔国家实验室的Sierra超级计算机在Fat-tree网络拓扑中利用InfiniBand互连，将所有计算节点、存储节点、管理节点互连到一个线性可扩展的网络中。这些网络基于Mellanox 100 gb /s EDR InfiniBand ConnectX-5适配器和Switch-IB2交换机，采用IBM提供的计算机架封装和冷却技术。这些设备支持网络内计算加速引擎，如Mellanox Scalable Hierarchical Aggregation and Reduction Protocol、GPU (graphics processor unit) Direct RDMA、高级自适应路由、Quality of Service等网络和应用加速。整个IBM Spectrum Message Passing Interface (MPI)消息传递软件栈实现了Open MPI，是IBM、Mellanox和NVIDIA之间的合作，用于优化端点之间的直接通信，无论是计算节点(带有IBM POWER cpu、NVIDIA gpu和闪存设备)，还是POWER托管的存储节点。Fat-tree网络可以隔离计算分区之间和存储子系统之间的流量，从而提供更可预测的应用程序性能。此外，该网络的高水平冗余及其重新配置能力确保了即使在网络组件发生故障后也能提供可靠的高性能。本文详细介绍了网络的硬件和软件体系结构和性能，并描述了为这一代InfiniBand设计的许多高性能计算(HPC)增强。

{"title":"The high-speed networks of the Summit and Sierra supercomputers","authors":"C. B. Stunkel;R. L. Graham;G. Shainer;M. Kagan;S. S. Sharkawi;B. Rosenburg;G. A. Chochia","doi":"10.1147/JRD.2020.2967330","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967330","url":null,"abstract":"Oak Ridge National Laboratory's Summit supercomputer and Lawrence Livermore National Laboratory's Sierra supercomputer utilize InfiniBand interconnect in a Fat-tree network topology, interconnecting all compute nodes, storage nodes, administration, and management nodes into one linearly scalable network. These networks are based on Mellanox 100-Gb/s EDR InfiniBand ConnectX-5 adapters and Switch-IB2 switches, with compute-rack packaging and cooling contributions from IBM. These devices support in-network computing acceleration engines such as Mellanox Scalable Hierarchical Aggregation and Reduction Protocol, graphics processor unit (GPU) Direct RDMA, advanced adaptive routing, Quality of Service, and other network and application acceleration. The overall IBM Spectrum Message Passing Interface (MPI) messaging software stack implements Open MPI, and was a collaboration between IBM, Mellanox, and NVIDIA to optimize direct communication between endpoints, whether compute nodes (with IBM POWER CPUs, NVIDIA GPUs, and flash memory devices), or POWER-hosted storage nodes. The Fat-tree network can isolate traffic among the compute partitions and to/from the storage subsystem, providing more predictable application performance. In addition, the high level of redundancy of this network and its reconfiguration capability ensures reliable high performance even after network component failures. This article details the hardware and software architecture and performance of the networks and describes a number of the high-performance computing (HPC) enhancements engineered into this generation of InfiniBand.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"3:1-3:10"},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49978544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Concurrent installation and acceptance of Summit and Sierra supercomputers Summit和Sierra超级计算机的同时安装和验收

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-01-16 DOI: 10.1147/JRD.2020.2967270

T. Liebsch

The deployment of any high-performance computer systems typically includes an acceptance process to validate the system's specifications, covering hardware, software, and delivered services. In this article, we describe the efforts undertaken by IBM and its partners to accomplish early preparations and then concurrently deliver, stabilize, and accept the two fastest supercomputers in the world at the time of deployment.

任何高性能计算机系统的部署通常包括验证系统规格的验收过程，包括硬件、软件和交付的服务。在本文中，我们将描述IBM及其合作伙伴为完成早期准备工作所做的努力，然后在部署时同时交付、稳定和接受世界上两台最快的超级计算机。

引用次数: 1

Cluster system management 集群系统管理

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development

Pub Date : 2020-01-16 DOI: 10.1147/JRD.2020.2967309

N. Besaw;L. Scheidenbach;J. Dunham;S. Kaur;A. Ohmacht;F. Pizzano;Y. Park

Cluster system management (CSM) was co-designed with the Department of Energy Labs to provide the support necessary to effectively manage the Summit and Sierra supercomputers. The CSM system administration tools provide a unified view of a large-scale cluster and the ability to examine and understand data from multiple sources. CSM consists of five components: 1) application programming interfaces (APIs) and infrastructure; 2) Big Data Store; 3) support for reliability, availability, and serviceability (RAS); 4) Diagnostic and Health Check; and 5) support for job management. APIs and infrastructure provide lightweight daemons for compute nodes, hardware and software inventory collection, job accounting, and RAS. Logs, environmental data, and performance data are collected in the Big Data Store for analysis. RAS events can trigger corrective actions by CSM. Diagnostic and Health Check are provided through a diagnostic framework and test results collection. To support job management, CSM coordinates with the Job Step Manager to provide an overlay network of JSM daemons. CSM is an open source and available at https://github.com/IBM/CAST. Documentation can be found at https://cast.readthedocs.io.

集群系统管理(CSM)与能源部实验室共同设计，为有效管理Summit和Sierra超级计算机提供必要的支持。CSM系统管理工具提供大规模集群的统一视图，以及检查和理解来自多个来源的数据的能力。CSM由五个部分组成:1)应用程序编程接口(api)和基础设施;2)大数据存储;3)对可靠性、可用性和可服务性(RAS)的支持;4)诊断和健康检查;5)支持作业管理。api和基础设施为计算节点、硬件和软件库存收集、作业记帐和RAS提供轻量级守护进程。日志数据、环境数据和性能数据被收集到大数据存储中进行分析。RAS事件可以触发CSM的纠正措施。诊断和运行状况检查通过诊断框架和测试结果集合提供。为了支持作业管理，CSM与作业步骤管理器协调，以提供JSM守护进程的覆盖网络。CSM是开源的，可以在https://github.com/IBM/CAST上获得。文档可以在https://cast.readthedocs.io上找到。

{"title":"Cluster system management","authors":"N. Besaw;L. Scheidenbach;J. Dunham;S. Kaur;A. Ohmacht;F. Pizzano;Y. Park","doi":"10.1147/JRD.2020.2967309","DOIUrl":"https://doi.org/10.1147/JRD.2020.2967309","url":null,"abstract":"Cluster system management (CSM) was co-designed with the Department of Energy Labs to provide the support necessary to effectively manage the Summit and Sierra supercomputers. The CSM system administration tools provide a unified view of a large-scale cluster and the ability to examine and understand data from multiple sources. CSM consists of five components: 1) application programming interfaces (APIs) and infrastructure; 2) Big Data Store; 3) support for reliability, availability, and serviceability (RAS); 4) Diagnostic and Health Check; and 5) support for job management. APIs and infrastructure provide lightweight daemons for compute nodes, hardware and software inventory collection, job accounting, and RAS. Logs, environmental data, and performance data are collected in the Big Data Store for analysis. RAS events can trigger corrective actions by CSM. Diagnostic and Health Check are provided through a diagnostic framework and test results collection. To support job management, CSM coordinates with the Job Step Manager to provide an overlay network of JSM daemons. CSM is an open source and available at \u0000<uri>https://github.com/IBM/CAST</uri>\u0000. Documentation can be found at \u0000<uri>https://cast.readthedocs.io</uri>\u0000.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"7:1-7:9"},"PeriodicalIF":1.3,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2967309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IBM Journal of Research and Development

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀