首页 > 最新文献

IBM Journal of Research and Development最新文献

英文 中文
Sierra Center of Excellence: Lessons learned Sierra卓越中心:经验教训
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-20 DOI: 10.1147/JRD.2019.2961069
J. P. Dahm;D. F. Richards;A. Black;A. D. Bertsch;L. Grinberg;I. Karlin;S. Kokkila-Schumacher;E. A. León;J. R. Neely;R. Pankajakshan;O. Pearce
The introduction of heterogeneous computing via GPUs from the Sierra architecture represented a significant shift in direction for computational science at Lawrence Livermore National Laboratory (LLNL), and therefore required significant preparation. Over the last five years, the Sierra Center of Excellence (CoE) has brought employees with specific expertise from IBM and NVIDIA together with LLNL in a concentrated effort to prepare applications, system software, and tools for the Sierra supercomputer. This article shares the process we applied for the CoE and documents lessons learned during the collaboration, with the hope that others will be able to learn from both our success and intermediate setbacks. We describe what we have found to work for the management of such a collaboration and best practices for algorithms and source code, system configuration and software stack, tools, and application performance.
Sierra架构通过gpu引入异构计算代表了劳伦斯利弗莫尔国家实验室(LLNL)计算科学方向的重大转变,因此需要大量的准备工作。在过去的五年中,Sierra卓越中心(CoE)将IBM和NVIDIA的专业人员与LLNL聚集在一起,集中精力为Sierra超级计算机准备应用程序、系统软件和工具。本文分享了我们申请CoE的过程,并记录了在协作过程中获得的经验教训,希望其他人能够从我们的成功和中间挫折中学习。我们描述了我们发现的对这种协作的管理以及算法和源代码、系统配置和软件堆栈、工具和应用程序性能的最佳实践的工作。
{"title":"Sierra Center of Excellence: Lessons learned","authors":"J. P. Dahm;D. F. Richards;A. Black;A. D. Bertsch;L. Grinberg;I. Karlin;S. Kokkila-Schumacher;E. A. León;J. R. Neely;R. Pankajakshan;O. Pearce","doi":"10.1147/JRD.2019.2961069","DOIUrl":"https://doi.org/10.1147/JRD.2019.2961069","url":null,"abstract":"The introduction of heterogeneous computing via GPUs from the Sierra architecture represented a significant shift in direction for computational science at Lawrence Livermore National Laboratory (LLNL), and therefore required significant preparation. Over the last five years, the Sierra Center of Excellence (CoE) has brought employees with specific expertise from IBM and NVIDIA together with LLNL in a concentrated effort to prepare applications, system software, and tools for the Sierra supercomputer. This article shares the process we applied for the CoE and documents lessons learned during the collaboration, with the hope that others will be able to learn from both our success and intermediate setbacks. We describe what we have found to work for the management of such a collaboration and best practices for algorithms and source code, system configuration and software stack, tools, and application performance.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2961069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transformation of application enablement tools on CORAL systems CORAL系统上应用程序启用工具的转换
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-17 DOI: 10.1147/JRD.2019.2960246
S. Maerean;E. K. Lee;H.-F. Wen;I-H. Chung
The CORAL project exhibits an important shift in the computational paradigm from homogeneous to heterogeneous computing, where applications run on both the CPU and the accelerator (e.g., GPU). Existing applications optimized to run only on the CPU have to be rewritten to adopt accelerators and retuned to achieve optimal performance. The shift in the computational paradigm requires application development tools (e.g., compilers, performance profilers and tracers, and debuggers) change to better assist users. The CORAL project places a strong emphasis on open-source tools to create a collaborative environment in the tools community. In this article, we discuss the collaboration efforts and corresponding challenges to meet the CORAL requirements on tools and detail three of the challenges that required the most involvement. A usage scenario is provided to show how the tools may help users adopt the new computation environment and understand their application execution and the data flow at scale.
CORAL项目展示了计算范式从同质计算到异构计算的重要转变,在异构计算中,应用程序在CPU和加速器(例如GPU)上运行。现有的优化为仅在CPU上运行的应用程序必须重写以采用加速器,并重新调整以实现最佳性能。计算范式的转变需要改变应用程序开发工具(例如编译器、性能评测器和跟踪器以及调试器),以更好地帮助用户。CORAL项目非常重视开源工具,以在工具社区中创建协作环境。在本文中,我们讨论了为满足CORAL对工具的要求而进行的协作工作和相应的挑战,并详细介绍了最需要参与的三个挑战。提供了一个使用场景来展示这些工具如何帮助用户采用新的计算环境,并了解他们的应用程序执行和大规模的数据流。
{"title":"Transformation of application enablement tools on CORAL systems","authors":"S. Maerean;E. K. Lee;H.-F. Wen;I-H. Chung","doi":"10.1147/JRD.2019.2960246","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960246","url":null,"abstract":"The CORAL project exhibits an important shift in the computational paradigm from homogeneous to heterogeneous computing, where applications run on both the CPU and the accelerator (e.g., GPU). Existing applications optimized to run only on the CPU have to be rewritten to adopt accelerators and retuned to achieve optimal performance. The shift in the computational paradigm requires application development tools (e.g., compilers, performance profilers and tracers, and debuggers) change to better assist users. The CORAL project places a strong emphasis on open-source tools to create a collaborative environment in the tools community. In this article, we discuss the collaboration efforts and corresponding challenges to meet the CORAL requirements on tools and detail three of the challenges that required the most involvement. A usage scenario is provided to show how the tools may help users adopt the new computation environment and understand their application execution and the data flow at scale.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960246","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Call for Code: Developers tackle natural disasters with software 呼吁代码:开发人员用软件解决自然灾害
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-17 DOI: 10.1147/JRD.2019.2960241
D. Krook;S. Malaika
Natural disasters are increasing as highlighted in many reports including the Borgen Project. In 2018, David Clark Cause as creator and IBM as founding partner, in partnership with the United Nations Human Rights Office, the American Red Cross International Team, and The Linux Foundation, issued a “Call for Code” to developers to create robust projects that prepare communities for natural disasters and help them respond more quickly in their aftermath. This article covers the steps and tools used to engage with developers, the results from the first of five competitions to be run by the Call for Code Global Initiative over five years, and how the winners were selected. Insights from the mobilization of 100,000 developers toward this cause are described, as well as the lessons learned from running large-scale hackathons.
自然灾害正在增加,包括博根项目在内的许多报告都强调了这一点。2018年,David Clark Cause作为创始人,IBM作为创始合作伙伴,与联合国人权办公室、美国红十字国际团队和Linux基金会合作,向开发人员发出了“代码呼吁”,以创建强大的项目,为社区应对自然灾害做好准备,并帮助他们在灾难发生后更快地做出反应。这篇文章介绍了与开发人员接触的步骤和工具,五年来由代码全球倡议组织举办的五场比赛中的第一场比赛的结果,以及如何选出获胜者。描述了动员10万名开发人员参与这项事业的见解,以及从举办大型黑客马拉松中吸取的教训。
{"title":"Call for Code: Developers tackle natural disasters with software","authors":"D. Krook;S. Malaika","doi":"10.1147/JRD.2019.2960241","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960241","url":null,"abstract":"Natural disasters are increasing as highlighted in many reports including the Borgen Project. In 2018, David Clark Cause as creator and IBM as founding partner, in partnership with the United Nations Human Rights Office, the American Red Cross International Team, and The Linux Foundation, issued a “Call for Code” to developers to create robust projects that prepare communities for natural disasters and help them respond more quickly in their aftermath. This article covers the steps and tools used to engage with developers, the results from the first of five competitions to be run by the Call for Code Global Initiative over five years, and how the winners were selected. Insights from the mobilization of 100,000 developers toward this cause are described, as well as the lessons learned from running large-scale hackathons.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49980047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A unique approach to corporate disaster philanthropy focused on delivering technology and expertise 一种独特的企业灾难慈善方式,专注于提供技术和专业知识
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-17 DOI: 10.1147/JRD.2019.2960244
R. E. Curzon;P. Curotto;M. Evason;A. Failla;P. Kusterer;A. Ogawa;J. Paraszczak;S. Raghavan
The role of corporations and their corporate social responsibility (CSR)-related response to disasters in support of their communities has not been extensively documented; thus, this article attempts to explain the role that one corporation, IBM, has played in disaster response and how it has used IBM and open-source technologies to deal with a broad range of disasters. These technologies range from advanced seismic monitoring and flood management to predicting and improving refugee flows. The article outlines various principles that have guided IBM in shaping its disaster response and provides some insights into various sources of useful data and applications that can be used in these critical situations. It also details one example of an emerging technology that is being used in these efforts.
企业的作用及其与企业社会责任(CSR)有关的救灾反应,以支持其社区,尚未得到广泛的记录;因此,本文试图解释IBM公司在灾难响应中所扮演的角色,以及它如何使用IBM和开源技术来处理各种各样的灾难。这些技术的范围从先进的地震监测和洪水管理到预测和改善难民潮。本文概述了指导IBM形成灾难响应的各种原则,并对可用于这些关键情况的各种有用数据和应用程序来源提供了一些见解。它还详细介绍了正在这些努力中使用的一种新兴技术的一个例子。
{"title":"A unique approach to corporate disaster philanthropy focused on delivering technology and expertise","authors":"R. E. Curzon;P. Curotto;M. Evason;A. Failla;P. Kusterer;A. Ogawa;J. Paraszczak;S. Raghavan","doi":"10.1147/JRD.2019.2960244","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960244","url":null,"abstract":"The role of corporations and their corporate social responsibility (CSR)-related response to disasters in support of their communities has not been extensively documented; thus, this article attempts to explain the role that one corporation, IBM, has played in disaster response and how it has used IBM and open-source technologies to deal with a broad range of disasters. These technologies range from advanced seismic monitoring and flood management to predicting and improving refugee flows. The article outlines various principles that have guided IBM in shaping its disaster response and provides some insights into various sources of useful data and applications that can be used in these critical situations. It also details one example of an emerging technology that is being used in these efforts.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960244","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49986743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The CORAL supercomputer systems 珊瑚超级计算机系统
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-17 DOI: 10.1147/JRD.2019.2960220
W. A. Hanson
In 2014, the U.S. Department of Energy (DoE) initiated a multiyear collaboration between Oak Ridge National Laboratory (ORNL), Argonne National Laboratory, and Lawrence Livermore National Laboratory (LLNL), known as “CORAL,” the next major phase in the DoE's scientific computing roadmap. The IBM CORAL systems are based on a fundamentally new data-centric architecture, where compute power is embedded everywhere data resides, combining powerful central processing units (CPUs) with graphics processing units (GPUs) optimized for scientific computing and artificial intelligence workloads. The IBM CORAL systems were built on the combination of mature technologies: 9th-generation POWER CPU, 6th-generation NVIDIA GPU, and 5th-generation Mellanox InfiniBand. These systems are providing scientists with computing power to solve challenges in many research areas beyond previously possible. This article provides an overview of the system solutions deployed at ORNL and LLNL.
2014年,美国能源部(DoE)启动了橡树岭国家实验室(ORNL)、阿贡国家实验室和劳伦斯利弗莫尔国家实验室(LLNL)之间的多年合作,称为“CORAL”,这是美国能源部科学计算路线图的下一个主要阶段。IBM CORAL系统基于一种全新的以数据为中心的架构,计算能力嵌入到数据驻留的任何地方,将强大的中央处理单元(cpu)与针对科学计算和人工智能工作负载优化的图形处理单元(gpu)相结合。IBM CORAL系统是在第9代POWER CPU、第6代NVIDIA GPU和第5代Mellanox InfiniBand等成熟技术的基础上构建的。这些系统为科学家提供了计算能力,以解决许多研究领域的挑战,超出了以前的可能性。本文概述了在ORNL和LLNL部署的系统解决方案。
{"title":"The CORAL supercomputer systems","authors":"W. A. Hanson","doi":"10.1147/JRD.2019.2960220","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960220","url":null,"abstract":"In 2014, the U.S. Department of Energy (DoE) initiated a multiyear collaboration between Oak Ridge National Laboratory (ORNL), Argonne National Laboratory, and Lawrence Livermore National Laboratory (LLNL), known as “CORAL,” the next major phase in the DoE's scientific computing roadmap. The IBM CORAL systems are based on a fundamentally new data-centric architecture, where compute power is embedded everywhere data resides, combining powerful central processing units (CPUs) with graphics processing units (GPUs) optimized for scientific computing and artificial intelligence workloads. The IBM CORAL systems were built on the combination of mature technologies: 9th-generation POWER CPU, 6th-generation NVIDIA GPU, and 5th-generation Mellanox InfiniBand. These systems are providing scientists with computing power to solve challenges in many research areas beyond previously possible. This article provides an overview of the system solutions deployed at ORNL and LLNL.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960220","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49978542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Porting a 3D seismic modeling code (SW4) to CORAL machines 将3D地震建模代码(SW4)移植到CORAL机器上
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-17 DOI: 10.1147/JRD.2019.2960218
R. Pankajakshan;P.-H. Lin;B. Sjögreen
Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.
四阶地震波(SW4)使用具有O(100000)核的大型计算集群在笛卡尔和曲线网格上求解地震波方程。本文讨论了使用RAJA性能可移植性抽象层将SW4移植到CORAL架构上运行。比较了使用RAJA和CUDA的关键内核的性能,以估计使用可移植性抽象层的性能损失。讨论了提高GPU效率和最小化在消息传递接口(MPI)中花费的时间所需的代码更改。本文描述了一种有效地将大型代码库移植到基于GPU的机器的路径,同时避免了新架构在部署早期阶段的陷阱。讨论了代码中的当前瓶颈以及可能的体系结构或软件缓解措施。SW4在一个4-GPU CORAL节点上的运行速度是CTS-1节点(双Intel Xeon E5-2695 v4)的28倍。SW4目前正在Summit的1200个节点上以前所未有的分辨率(2030亿个网格点)和规模进行常规使用。
{"title":"Porting a 3D seismic modeling code (SW4) to CORAL machines","authors":"R. Pankajakshan;P.-H. Lin;B. Sjögreen","doi":"10.1147/JRD.2019.2960218","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960218","url":null,"abstract":"Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960218","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hybrid CPU/GPU tasks optimized for concurrency in OpenMP 在OpenMP中为并发性优化的混合CPU/GPU任务
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-17 DOI: 10.1147/JRD.2019.2960245
A. E. Eichenberger;G.-T. Bercea;A. Bataev;L. Grinberg;J. K. O'Brien
Sierra and Summit supercomputers exhibit a significant amount of intranode parallelism between the host POWER9 CPUs and their attached GPU devices. In this article, we show that exploiting device-level parallelism is key to achieving high performance by reducing overheads typically associated with CPU and GPU task execution. Moreover, manually exploiting this type of parallelism in large-scale applications is nontrivial and error-prone. We hide the complexity of exploiting this hybrid intranode parallelism using the OpenMP programming model abstraction. The implementation leverages the semantics of OpenMP tasks to express asynchronous task computations and their associated dependences. Launching tasks on the CPU threads requires a careful design of work-stealing algorithms to provide efficient load balancing among CPU threads. We propose a novel algorithm that removes locks from all task queueing operations that are on the critical path. Tasks assigned to GPU devices require additional steps such as copying input data to GPU devices, launching the computation kernels, and copying data back to the host CPU memory. We perform key optimizations to reduce the cost of these additional steps by tightly integrating data transfers and GPU computations into streams of asynchronous GPU operations. We further map high-level dependences between GPU tasks to the same asynchronous GPU streams to further avoid unnecessary synchronization. Results validate our approach.
Sierra和Summit超级计算机在主机POWER9 CPU及其连接的GPU设备之间表现出大量的内部节点并行性。在本文中,我们展示了利用设备级并行性是通过减少通常与CPU和GPU任务执行相关的开销来实现高性能的关键。此外,在大规模应用程序中手动利用这种类型的并行性是不平凡的,而且容易出错。我们使用OpenMP编程模型抽象来隐藏利用这种混合内部节点并行性的复杂性。该实现利用OpenMP任务的语义来表达异步任务计算及其相关的依赖关系。在CPU线程上启动任务需要仔细设计工作窃取算法,以在CPU线程之间提供有效的负载平衡。我们提出了一种新的算法,可以从关键路径上的所有任务排队操作中移除锁。分配给GPU设备的任务需要额外的步骤,例如将输入数据复制到GPU设备、启动计算内核以及将数据复制回主机CPU存储器。我们通过将数据传输和GPU计算紧密集成到异步GPU操作流中来执行关键优化,以降低这些额外步骤的成本。我们进一步将GPU任务之间的高级依赖关系映射到相同的异步GPU流,以进一步避免不必要的同步。结果验证了我们的方法。
{"title":"Hybrid CPU/GPU tasks optimized for concurrency in OpenMP","authors":"A. E. Eichenberger;G.-T. Bercea;A. Bataev;L. Grinberg;J. K. O'Brien","doi":"10.1147/JRD.2019.2960245","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960245","url":null,"abstract":"Sierra and Summit supercomputers exhibit a significant amount of intranode parallelism between the host POWER9 CPUs and their attached GPU devices. In this article, we show that exploiting device-level parallelism is key to achieving high performance by reducing overheads typically associated with CPU and GPU task execution. Moreover, manually exploiting this type of parallelism in large-scale applications is nontrivial and error-prone. We hide the complexity of exploiting this hybrid intranode parallelism using the OpenMP programming model abstraction. The implementation leverages the semantics of OpenMP tasks to express asynchronous task computations and their associated dependences. Launching tasks on the CPU threads requires a careful design of work-stealing algorithms to provide efficient load balancing among CPU threads. We propose a novel algorithm that removes locks from all task queueing operations that are on the critical path. Tasks assigned to GPU devices require additional steps such as copying input data to GPU devices, launching the computation kernels, and copying data back to the host CPU memory. We perform key optimizations to reduce the cost of these additional steps by tightly integrating data transfers and GPU computations into streams of asynchronous GPU operations. We further map high-level dependences between GPU tasks to the same asynchronous GPU streams to further avoid unnecessary synchronization. Results validate our approach.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Quantitative modeling in disaster management: A literature review 灾害管理中的定量建模:文献综述
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-17 DOI: 10.1147/JRD.2019.2960356
A. E. Baxter;H. E. Wilborn Lagerman;P. Keskinocak
The number, magnitude, complexity, and impact of natural disasters have been steadily increasing in various parts of the world. When preparing for, responding to, and recovering from a disaster, multiple organizations make decisions and take actions considering the needs, available resources, and priorities of the affected communities, emergency supply chains, and infrastructures. Most of the prior research focuses on decision-making for independent systems (e.g., single critical infrastructure networks or distinct relief resources). An emerging research area extends the focus to interdependent systems (i.e., multiple dependent networks or resources). In this article, we survey the literature on modeling approaches for disaster management problems on independent systems, discuss some recent work on problems involving demand, resource, and/or network interdependencies, and offer future research directions to add to this growing research area.
自然灾害的数量、规模、复杂性和影响在世界各地稳步增加。在准备、应对和从灾难中恢复时,多个组织会根据受影响社区、应急供应链和基础设施的需求、可用资源和优先事项做出决定并采取行动。先前的大多数研究都集中在独立系统的决策上(例如,单个关键基础设施网络或不同的救济资源)。一个新兴的研究领域将重点扩展到相互依赖的系统(即多个相互依赖的网络或资源)。在这篇文章中,我们调查了独立系统上灾害管理问题建模方法的文献,讨论了最近关于需求、资源和/或网络相互依赖性问题的一些工作,并为这一不断增长的研究领域提供了未来的研究方向。
{"title":"Quantitative modeling in disaster management: A literature review","authors":"A. E. Baxter;H. E. Wilborn Lagerman;P. Keskinocak","doi":"10.1147/JRD.2019.2960356","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960356","url":null,"abstract":"The number, magnitude, complexity, and impact of natural disasters have been steadily increasing in various parts of the world. When preparing for, responding to, and recovering from a disaster, multiple organizations make decisions and take actions considering the needs, available resources, and priorities of the affected communities, emergency supply chains, and infrastructures. Most of the prior research focuses on decision-making for independent systems (e.g., single critical infrastructure networks or distinct relief resources). An emerging research area extends the focus to interdependent systems (i.e., multiple dependent networks or resources). In this article, we survey the literature on modeling approaches for disaster management problems on independent systems, discuss some recent work on problems involving demand, resource, and/or network interdependencies, and offer future research directions to add to this growing research area.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960356","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49980046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit 在Summit上使用进化算法解决深度学习者训练数据问题
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-17 DOI: 10.1147/JRD.2019.2960225
M. Coletti;A. Fafard;D. Page
Architectural and hyperparameter design choices can influence deep-learner (DL) model fidelity but can also be affected by malformed training and validation data. However, practitioners may spend significant time refining layers and hyperparameters before discovering that distorted training data were impeding the training progress. We found that an evolutionary algorithm (EA) can be used to troubleshoot this kind of DL problem. An EA evaluated thousands of DL configurations on Summit that yielded no overall improvement in DL performance, which suggested problems with the training and validation data. We suspected that contrast limited adaptive histogram equalization enhancement that was applied to previously generated digital surface models, for which we were training DLs to find errors, had damaged the training data. Subsequent runs with an alternative global normalization yielded significantly improved DL performance. However, the DL intersection over unions still exhibited consistent subpar performance, which suggested further problems with the training data and DL approach. Nonetheless, we were able to diagnose this problem within a 12-hour span via Summit runs, which prevented several weeks of unproductive trial-and-error DL configuration refinement and allowed for a more timely convergence on an ultimately viable solution.
架构和超参数设计选择可能会影响深度学习器(DL)模型的保真度,但也可能受到格式错误的训练和验证数据的影响。然而,在发现扭曲的训练数据阻碍训练进度之前,从业者可能会花费大量时间来细化层和超参数。我们发现,进化算法(EA)可以用来解决这类DL问题。EA在Summit上评估了数千个DL配置,但这些配置并没有使DL性能得到全面改善,这表明训练和验证数据存在问题。我们怀疑,应用于先前生成的数字表面模型的对比度有限的自适应直方图均衡增强已经损坏了训练数据,我们正在训练DLs以查找错误。随后使用替代全局规范化的运行显著提高了DL性能。然而,并集上的DL交集仍然表现出一致的亚性能,这表明训练数据和DL方法存在进一步的问题。尽管如此,我们还是能够通过Summit运行在12小时内诊断出这个问题,这避免了数周的无效试错DL配置优化,并使我们能够更及时地达成最终可行的解决方案。
{"title":"Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit","authors":"M. Coletti;A. Fafard;D. Page","doi":"10.1147/JRD.2019.2960225","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960225","url":null,"abstract":"Architectural and hyperparameter design choices can influence deep-learner (DL) model fidelity but can also be affected by malformed training and validation data. However, practitioners may spend significant time refining layers and hyperparameters before discovering that distorted training data were impeding the training progress. We found that an evolutionary algorithm (EA) can be used to troubleshoot this kind of DL problem. An EA evaluated thousands of DL configurations on Summit that yielded no overall improvement in DL performance, which suggested problems with the training and validation data. We suspected that contrast limited adaptive histogram equalization enhancement that was applied to previously generated digital surface models, for which we were training DLs to find errors, had damaged the training data. Subsequent runs with an alternative global normalization yielded significantly improved DL performance. However, the DL intersection over unions still exhibited consistent subpar performance, which suggested further problems with the training data and DL approach. Nonetheless, we were able to diagnose this problem within a 12-hour span via Summit runs, which prevented several weeks of unproductive trial-and-error DL configuration refinement and allowed for a more timely convergence on an ultimately viable solution.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Summit and Sierra supercomputer cooling solutions Summit和Sierra超级计算机的冷却解决方案
IF 1.3 4区 计算机科学 Q1 Computer Science Pub Date : 2019-12-10 DOI: 10.1147/JRD.2019.2958902
S. Tian;T. Takken;V. Mahaney;C. Marroquin;M. Schultz;M. Hoffmeyer;Y. Yao;K. O'Connell;A. Yuksel;P. Coteus
Achieving optimal data center cooling efficiency requires effective water cooling of high-heat-density components, coupled with optimal warmer water temperatures and the correct order of water preheating from any air-cooled components. The Summit and Sierra supercomputers implemented efficient cooling by using high-performance cold plates to directly water-cool all central processing units (CPUs) and graphics processing units (GPUs) processors with warm inlet water. Cost performance was maximized by directly air-cooling the 10% to 15% of the compute drawer heat load generated by the lowest heat density components. For the Summit system, a rear-door heat exchanger allowed zero net heat load to air; the overall system efficiency was optimized by using the preheated water from the heat exchanger as an input to cool the higher power CPUs and GPUs.
要实现最佳的数据中心冷却效率,需要对高密度组件进行有效的水冷却,再加上最佳的温水温度,以及对任何风冷组件进行正确的水预热顺序。Summit和Sierra超级计算机通过使用高性能冷板直接用温水对所有中央处理器(cpu)和图形处理单元(gpu)处理器进行水冷却,实现了高效的冷却。通过直接风冷最低热密度组件产生的计算抽屉热负荷的10%至15%,使成本效益最大化。对于Summit系统,后门热交换器使空气净热负荷为零;通过使用热交换器的预热水作为输入冷却更高功率的cpu和gpu,优化了系统的整体效率。
{"title":"Summit and Sierra supercomputer cooling solutions","authors":"S. Tian;T. Takken;V. Mahaney;C. Marroquin;M. Schultz;M. Hoffmeyer;Y. Yao;K. O'Connell;A. Yuksel;P. Coteus","doi":"10.1147/JRD.2019.2958902","DOIUrl":"https://doi.org/10.1147/JRD.2019.2958902","url":null,"abstract":"Achieving optimal data center cooling efficiency requires effective water cooling of high-heat-density components, coupled with optimal warmer water temperatures and the correct order of water preheating from any air-cooled components. The Summit and Sierra supercomputers implemented efficient cooling by using high-performance cold plates to directly water-cool all central processing units (CPUs) and graphics processing units (GPUs) processors with warm inlet water. Cost performance was maximized by directly air-cooling the 10% to 15% of the compute drawer heat load generated by the lowest heat density components. For the Summit system, a rear-door heat exchanger allowed zero net heat load to air; the overall system efficiency was optimized by using the preheated water from the heat exchanger as an input to cool the higher power CPUs and GPUs.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2958902","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
IBM Journal of Research and Development
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1