{"title":"Redefining IBM power system design for CORAL","authors":"S. Roberts;C. Mann;C. Marroquin","doi":"10.1147/JRD.2019.2963637","DOIUrl":null,"url":null,"abstract":"Stipulations in the 2014 Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) joint procurement activity not only motivated a fundamental change in IBM's high-performance computer design, which refocused IBM power systems on compute nodes that can scale to 200 petaflops with access to 2.5 PB of memory, but also served the commercial market for single-server applications. The distribution of both processing elements and memory required a careful look at data movement. The resultant AC922 POWER9 system features NVIDIA V100 GPUs with cache line access granularity, more than double the IO bandwidth of PCIe Gen3, and low-latency interfaces interconnected by the state-of-the-art dual-rail Mellanox CAPI EDR HCAs running at 50 Gb/s. With processing units designed to operate at 250 and 300 W, a single system can produce up to 3,080 kW. The overall CORAL solutions achieved power usage effectiveness rankings in the top ten on the Green500. Previous power designs used uniquely designed cabinets and scaled-up infrastructure to achieve efficiency. For successful commercial use, our design uses industry-standard 19-in drawers and racks. Both air- and water-cooled solutions allow for use in a wide range of customer environments. This article documents the novel design features that facilitate data movement and enable new coherent programming models. It describes how three generations of system designs became the foundation for the CORAL contract fulfillment and illustrates key features and specifications of the final product.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"2:1-2:10"},"PeriodicalIF":1.3000,"publicationDate":"2020-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2963637","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IBM Journal of Research and Development","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/8949743/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 2
Abstract
Stipulations in the 2014 Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) joint procurement activity not only motivated a fundamental change in IBM's high-performance computer design, which refocused IBM power systems on compute nodes that can scale to 200 petaflops with access to 2.5 PB of memory, but also served the commercial market for single-server applications. The distribution of both processing elements and memory required a careful look at data movement. The resultant AC922 POWER9 system features NVIDIA V100 GPUs with cache line access granularity, more than double the IO bandwidth of PCIe Gen3, and low-latency interfaces interconnected by the state-of-the-art dual-rail Mellanox CAPI EDR HCAs running at 50 Gb/s. With processing units designed to operate at 250 and 300 W, a single system can produce up to 3,080 kW. The overall CORAL solutions achieved power usage effectiveness rankings in the top ten on the Green500. Previous power designs used uniquely designed cabinets and scaled-up infrastructure to achieve efficiency. For successful commercial use, our design uses industry-standard 19-in drawers and racks. Both air- and water-cooled solutions allow for use in a wide range of customer environments. This article documents the novel design features that facilitate data movement and enable new coherent programming models. It describes how three generations of system designs became the foundation for the CORAL contract fulfillment and illustrates key features and specifications of the final product.
期刊介绍:
The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems. Papers are written for the worldwide scientific research and development community and knowledgeable professionals.
Submitted papers are welcome from the IBM technical community and from non-IBM authors on topics relevant to the scientific and technical content of the Journal.