Pub Date : 2020-01-15DOI: 10.1147/JRD.2020.2965881
L. Luo;T. P. Straatsma;L. E. Aguilar Suarez;R. Broer;D. Bykov;E. F. D'Azevedo;S. S. Faraji;K. C. Gottiparthi;C. De Graaf;J. A. Harris;R. W. A. Havenith;H. J. Aa. Jensen;W. Joubert;R. K. Kathir;J. Larkin;Y. W. Li;D. I. Lyakh;O. E. B. Messer;M. R. Norman;J. C. Oefelein;R. Sankaran;A. F. Tillack;A. L. Barnes;L. Visscher;J. C. Wells;M. Wibowo
High-performance computing (HPC) increasingly relies on heterogeneous architectures to achieve higher performance. In the Oak Ridge Leadership Facility (OLCF), Oak Ridge, TN, USA, this trend continues as its latest supercomputer, Summit, entered production in early 2019. The combination of IBM POWER9 CPU and NVIDIA V100 GPU, along with a fast NVLink2 interconnect and other latest technologies, pushes system performance to a new height and breaks the exascale barrier by certain measures. Due to Summit's powerful GPUs and much higher GPU–CPU ratio, offloading to accelerators becomes a requirement for any application, which intends to effectively use the system. To facilitate navigating a complex landscape of competing heterogeneous architectures, a collection of applications from a wide spectrum of scientific domains is selected for early adoption on Summit. In this article, the experience and lessons learned are summarized, in the hope of providing useful guidance to address new programming challenges, such as scalability, performance portability, and software maintainability, for future application development efforts on heterogeneous HPC systems.
{"title":"Pre-exascale accelerated application development: The ORNL Summit experience","authors":"L. Luo;T. P. Straatsma;L. E. Aguilar Suarez;R. Broer;D. Bykov;E. F. D'Azevedo;S. S. Faraji;K. C. Gottiparthi;C. De Graaf;J. A. Harris;R. W. A. Havenith;H. J. Aa. Jensen;W. Joubert;R. K. Kathir;J. Larkin;Y. W. Li;D. I. Lyakh;O. E. B. Messer;M. R. Norman;J. C. Oefelein;R. Sankaran;A. F. Tillack;A. L. Barnes;L. Visscher;J. C. Wells;M. Wibowo","doi":"10.1147/JRD.2020.2965881","DOIUrl":"https://doi.org/10.1147/JRD.2020.2965881","url":null,"abstract":"High-performance computing (HPC) increasingly relies on heterogeneous architectures to achieve higher performance. In the Oak Ridge Leadership Facility (OLCF), Oak Ridge, TN, USA, this trend continues as its latest supercomputer, Summit, entered production in early 2019. The combination of IBM POWER9 CPU and NVIDIA V100 GPU, along with a fast NVLink2 interconnect and other latest technologies, pushes system performance to a new height and breaks the exascale barrier by certain measures. Due to Summit's powerful GPUs and much higher GPU–CPU ratio, offloading to accelerators becomes a requirement for any application, which intends to effectively use the system. To facilitate navigating a complex landscape of competing heterogeneous architectures, a collection of applications from a wide spectrum of scientific domains is selected for early adoption on Summit. In this article, the experience and lessons learned are summarized, in the hope of providing useful guidance to address new programming challenges, such as scalability, performance portability, and software maintainability, for future application development efforts on heterogeneous HPC systems.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"11:1-11:21"},"PeriodicalIF":1.3,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2965881","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-03DOI: 10.1147/JRD.2019.2963637
S. Roberts;C. Mann;C. Marroquin
Stipulations in the 2014 Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) joint procurement activity not only motivated a fundamental change in IBM's high-performance computer design, which refocused IBM power systems on compute nodes that can scale to 200 petaflops with access to 2.5 PB of memory, but also served the commercial market for single-server applications. The distribution of both processing elements and memory required a careful look at data movement. The resultant AC922 POWER9 system features NVIDIA V100 GPUs with cache line access granularity, more than double the IO bandwidth of PCIe Gen3, and low-latency interfaces interconnected by the state-of-the-art dual-rail Mellanox CAPI EDR HCAs running at 50 Gb/s. With processing units designed to operate at 250 and 300 W, a single system can produce up to 3,080 kW. The overall CORAL solutions achieved power usage effectiveness rankings in the top ten on the Green500. Previous power designs used uniquely designed cabinets and scaled-up infrastructure to achieve efficiency. For successful commercial use, our design uses industry-standard 19-in drawers and racks. Both air- and water-cooled solutions allow for use in a wide range of customer environments. This article documents the novel design features that facilitate data movement and enable new coherent programming models. It describes how three generations of system designs became the foundation for the CORAL contract fulfillment and illustrates key features and specifications of the final product.
{"title":"Redefining IBM power system design for CORAL","authors":"S. Roberts;C. Mann;C. Marroquin","doi":"10.1147/JRD.2019.2963637","DOIUrl":"https://doi.org/10.1147/JRD.2019.2963637","url":null,"abstract":"Stipulations in the 2014 Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) joint procurement activity not only motivated a fundamental change in IBM's high-performance computer design, which refocused IBM power systems on compute nodes that can scale to 200 petaflops with access to 2.5 PB of memory, but also served the commercial market for single-server applications. The distribution of both processing elements and memory required a careful look at data movement. The resultant AC922 POWER9 system features NVIDIA V100 GPUs with cache line access granularity, more than double the IO bandwidth of PCIe Gen3, and low-latency interfaces interconnected by the state-of-the-art dual-rail Mellanox CAPI EDR HCAs running at 50 Gb/s. With processing units designed to operate at 250 and 300 W, a single system can produce up to 3,080 kW. The overall CORAL solutions achieved power usage effectiveness rankings in the top ten on the Green500. Previous power designs used uniquely designed cabinets and scaled-up infrastructure to achieve efficiency. For successful commercial use, our design uses industry-standard 19-in drawers and racks. Both air- and water-cooled solutions allow for use in a wide range of customer environments. This article documents the novel design features that facilitate data movement and enable new coherent programming models. It describes how three generations of system designs became the foundation for the CORAL contract fulfillment and illustrates key features and specifications of the final product.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"2:1-2:10"},"PeriodicalIF":1.3,"publicationDate":"2020-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2963637","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49978543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1147/JRD.2020.2966837
{"title":"Preface: Disaster Response and Management","authors":"","doi":"10.1147/JRD.2020.2966837","DOIUrl":"https://doi.org/10.1147/JRD.2020.2966837","url":null,"abstract":"","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 1/2","pages":"1-3"},"PeriodicalIF":1.3,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2020.2966837","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49986745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-31DOI: 10.35429/jrd.2019.16.5.12.20
Eli González-Durán, M. Zamora-Antuñano, L. Lira-Cortes, N. Méndez-Lozano
The Centro Nacional de Metrología is developing a reference calorimeter to measure the superior calorific value of natural gas in collaboration with the Instituto Tecnológico de Celaya. We present the study of the combustion chamber for two formulations a steady state (already published) against the transient state. The study of the combustion chamber is performed employing computational fluid dynamics (CFD) through FLUENT®. For this work, specific parameters were set to define and simulate the combustion process involving the exchange of energy, momentum and mass transfer. In this work, we present simulations performed in steady and transient state, for which was used the Eddy Dissipation Model (EDM). Is shown the simulation of two geometries for the combustion chamber; one cylindrical body a hemispherical lid and the other elliptical, which was proposed to increase the area to heat transfer to the surrounding medium, water in our case. The criterion for selection is the chamber that achieves the lowest temperature for waste combustion gases at the exit. Achieved by the cylindrical chamber with a hemispherical lid in the first 4 seconds with a difference of 0.4 °C lower than the elliptical chamber.
Metrología国家中心正在与Tecnológico de Celaya研究所合作开发一种参考量热计,以测量天然气的高热值。我们提出了两种配方燃烧室的研究,一种是稳态(已发表),另一种是瞬态。燃烧室的研究是通过FLUENT®使用计算流体动力学(CFD)进行的。在这项工作中,设置了特定的参数来定义和模拟涉及能量交换、动量和质量传递的燃烧过程。在这项工作中,我们采用涡流耗散模型(EDM)进行稳态和瞬态模拟。给出了燃烧室两种几何形状的模拟;一个圆柱形体是半球形的盖子,另一个是椭圆形的盖子,这是为了增加热量传递到周围介质的面积,在我们的例子中是水。选择的标准是在出口达到废燃烧气体最低温度的腔室。采用半球形盖的圆柱形腔在前4秒内实现,比椭圆腔低0.4℃。
{"title":"Numerical Simulation of the Combustion Chamber for a New Reference Combustion Calorimeter","authors":"Eli González-Durán, M. Zamora-Antuñano, L. Lira-Cortes, N. Méndez-Lozano","doi":"10.35429/jrd.2019.16.5.12.20","DOIUrl":"https://doi.org/10.35429/jrd.2019.16.5.12.20","url":null,"abstract":"The Centro Nacional de Metrología is developing a reference calorimeter to measure the superior calorific value of natural gas in collaboration with the Instituto Tecnológico de Celaya. We present the study of the combustion chamber for two formulations a steady state (already published) against the transient state. The study of the combustion chamber is performed employing computational fluid dynamics (CFD) through FLUENT®. For this work, specific parameters were set to define and simulate the combustion process involving the exchange of energy, momentum and mass transfer. In this work, we present simulations performed in steady and transient state, for which was used the Eddy Dissipation Model (EDM). Is shown the simulation of two geometries for the combustion chamber; one cylindrical body a hemispherical lid and the other elliptical, which was proposed to increase the area to heat transfer to the surrounding medium, water in our case. The criterion for selection is the chamber that achieves the lowest temperature for waste combustion gases at the exit. Achieved by the cylindrical chamber with a hemispherical lid in the first 4 seconds with a difference of 0.4 °C lower than the elliptical chamber.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"29 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74392325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-31DOI: 10.35429/10.35429/jrd.2019.16.5.28.37
Graciella Rosado-Vila, Rafael Zapata-May, Fátima Sansores-Ambrosio, Jorge Vidal-Paredes
Introduction: Insulin is a hormone secreted by the pancreas that has the function of controlling blood sugar concentration. The most common type of diabetes is type 2 which occurs 90 to 95% of cases. The most frequent alterations at the stomatological level are periodontal disease, gingivitis, caries, xerostomia (dry mouth syndrome), so there is a need to investigate how susceptible patients are to suffer from this disease and to be able to take the necessary preventive measures. had similar plaque levels. RESULTS: The sample studied corresponded to a total of 100 patients, 49 female (49%), and 51 male (51%). The average age of the sample was 54.89 years ± 10.85 years with a range of ages between 40 and 70 years. The most representative age group was the group of 40 to 50 years with 39%, followed by the group of 51-60 years with 37% and the group of 61-70 years with 24%. In the Gingival index it was found that 45% of the patients presented mild gingivitis, 13% moderate gingivitis and 21% severe gingivitis.
{"title":"Oral health in patients with diabetes mellitus type 2 from the faculty of dentistry in San Francisco de Campeche 2016","authors":"Graciella Rosado-Vila, Rafael Zapata-May, Fátima Sansores-Ambrosio, Jorge Vidal-Paredes","doi":"10.35429/10.35429/jrd.2019.16.5.28.37","DOIUrl":"https://doi.org/10.35429/10.35429/jrd.2019.16.5.28.37","url":null,"abstract":"Introduction: Insulin is a hormone secreted by the pancreas that has the function of controlling blood sugar concentration. The most common type of diabetes is type 2 which occurs 90 to 95% of cases. The most frequent alterations at the stomatological level are periodontal disease, gingivitis, caries, xerostomia (dry mouth syndrome), so there is a need to investigate how susceptible patients are to suffer from this disease and to be able to take the necessary preventive measures. had similar plaque levels. RESULTS: The sample studied corresponded to a total of 100 patients, 49 female (49%), and 51 male (51%). The average age of the sample was 54.89 years ± 10.85 years with a range of ages between 40 and 70 years. The most representative age group was the group of 40 to 50 years with 39%, followed by the group of 51-60 years with 37% and the group of 61-70 years with 24%. In the Gingival index it was found that 45% of the patients presented mild gingivitis, 13% moderate gingivitis and 21% severe gingivitis.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"767 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76933998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-31DOI: 10.35429/jrd.2019.16.5.21.27
J. J. Esqueda-Elizondo, D. A. Trujillo-Toledo, M. A. Pinto-Ramos, Roberto Alejandro Reyes-Martínez
A methodology for the selection and determination of electroencephalographic (EEG) signal patterns is presented at the case study level, which can later be used as on-off control signals in other applications. Electroencephalographic signals are acquired through the use of a brain-computer interface (BCI). These systems capture electrical signals from the cortex of the brain and transfer them to a computer so that they can be analyzed by algorithms and some action is taken. In this case, the EEG signals are acquired through the wireless 14-channel Epoc+ platform. The methodology used consists first in acquiring signals from the user sample in three scenarios: in relaxation, thinking about turning on and off. Subsequently, the wavelet transform of each of the channels is obtained for each of the cases and the most significant coefficients are taken into account. Then, through digital signal processing algorithms, descriptive parameters are obtained for the on and off cases, which are used as patterns to describe each of the actions. With this information, a comparison between the incoming signals and the previously stored patterns is made to execute one of the established commands.
{"title":"Methodology for pattern determination in electroencephalographic signals","authors":"J. J. Esqueda-Elizondo, D. A. Trujillo-Toledo, M. A. Pinto-Ramos, Roberto Alejandro Reyes-Martínez","doi":"10.35429/jrd.2019.16.5.21.27","DOIUrl":"https://doi.org/10.35429/jrd.2019.16.5.21.27","url":null,"abstract":"A methodology for the selection and determination of electroencephalographic (EEG) signal patterns is presented at the case study level, which can later be used as on-off control signals in other applications. Electroencephalographic signals are acquired through the use of a brain-computer interface (BCI). These systems capture electrical signals from the cortex of the brain and transfer them to a computer so that they can be analyzed by algorithms and some action is taken. In this case, the EEG signals are acquired through the wireless 14-channel Epoc+ platform. The methodology used consists first in acquiring signals from the user sample in three scenarios: in relaxation, thinking about turning on and off. Subsequently, the wavelet transform of each of the channels is obtained for each of the cases and the most significant coefficients are taken into account. Then, through digital signal processing algorithms, descriptive parameters are obtained for the on and off cases, which are used as patterns to describe each of the actions. With this information, a comparison between the incoming signals and the previously stored patterns is made to execute one of the established commands.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90102648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-31DOI: 10.35429/jrd.2019.16.5.1.6
Silvino Rojas-Escobar, B. González-Contreras, Patricia Jaramillo-Quintero, Antonio Guevara-García
Bioreactors of industrial scale for gaseous biofuels constitute a field of research worldwide. Automation at a profitable technical and economic level has not been possible because of fluctuating biological systems. The quantification of biogas in continuous flow is difficult to implement by Gas Chromatography and it is very expensive in account of special sensors. In this work, we developed a system with MQ8 hydrogen and MQ4 methane sensors, used in the detection of industrial leaks, for the determination of gas concentration. The sensors were installed on Arduino cards and programmed to plot the concentration in real time. Calibration curves were made for these sensors making use of a standardized mixture of gases, in hermetic jars of known volume. The result is exponential and reproducible, and when using real biogas samples, no problems of interference with other gases are observed. The prototypes are very low cost with respect to the GC equipment and can be installed at the gas outlet of bioreactors with a mechatronic system that allows the monitoring of the composition in real time, which will allow to obtain microbial kinetics in semi-continuous flow in a very economical way.
{"title":"Low-cost method for quantification of hydrogen and methane in continuous flow bioreactors","authors":"Silvino Rojas-Escobar, B. González-Contreras, Patricia Jaramillo-Quintero, Antonio Guevara-García","doi":"10.35429/jrd.2019.16.5.1.6","DOIUrl":"https://doi.org/10.35429/jrd.2019.16.5.1.6","url":null,"abstract":"Bioreactors of industrial scale for gaseous biofuels constitute a field of research worldwide. Automation at a profitable technical and economic level has not been possible because of fluctuating biological systems. The quantification of biogas in continuous flow is difficult to implement by Gas Chromatography and it is very expensive in account of special sensors. In this work, we developed a system with MQ8 hydrogen and MQ4 methane sensors, used in the detection of industrial leaks, for the determination of gas concentration. The sensors were installed on Arduino cards and programmed to plot the concentration in real time. Calibration curves were made for these sensors making use of a standardized mixture of gases, in hermetic jars of known volume. The result is exponential and reproducible, and when using real biogas samples, no problems of interference with other gases are observed. The prototypes are very low cost with respect to the GC equipment and can be installed at the gas outlet of bioreactors with a mechatronic system that allows the monitoring of the composition in real time, which will allow to obtain microbial kinetics in semi-continuous flow in a very economical way.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"37 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88198745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-31DOI: 10.35429/jrd.2019.16.5.7.11
M. Prado-Salazar, José Gabriel Barboza-Briones
This project aims to produce electricity using a static bicycle, which has been made some modifications to take advantage of both tires. Along with these have been placed two dynamos which, having friction with the tires, transform mechanical energy into electrical energy, enough to recharge a cell phone. Parallel to this, it stops consuming electricity from the supply network which represents an energy and economic savings, if it is taken to large numbers of cell phones. By using this type of alternative power generation, we are also not emitting greenhouse gases into the atmosphere, which is also helping our health and the environment. This research is able to provide electrical power to cell phones in a friendly way with the environment, entertaining and healthy to keep in shape when charging our electronic devices, being a center of attention for students, since the circuit system allows to deliver 5 V and 0.7 A in direct current in approximately 15 minutes, achieving the load of 15% of a cell battery.
{"title":"Transformation of kinetic energy to electrical energy through a static system to recharge electronic devices","authors":"M. Prado-Salazar, José Gabriel Barboza-Briones","doi":"10.35429/jrd.2019.16.5.7.11","DOIUrl":"https://doi.org/10.35429/jrd.2019.16.5.7.11","url":null,"abstract":"This project aims to produce electricity using a static bicycle, which has been made some modifications to take advantage of both tires. Along with these have been placed two dynamos which, having friction with the tires, transform mechanical energy into electrical energy, enough to recharge a cell phone. Parallel to this, it stops consuming electricity from the supply network which represents an energy and economic savings, if it is taken to large numbers of cell phones. By using this type of alternative power generation, we are also not emitting greenhouse gases into the atmosphere, which is also helping our health and the environment. This research is able to provide electrical power to cell phones in a friendly way with the environment, entertaining and healthy to keep in shape when charging our electronic devices, being a center of attention for students, since the circuit system allows to deliver 5 V and 0.7 A in direct current in approximately 15 minutes, achieving the load of 15% of a cell battery.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"23 2 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90921303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-25DOI: 10.1147/JRD.2019.2962428
E. Tiotto;B. Mahjour;W. Tsang;X. Xue;T. Islam;W. Chen
Ability to efficiently offload computational workloads to graphic processing units (GPUs) is critical for the success of hybrid CPU–GPU architectures, such as the Summit and Sierra supercomputing systems. OpenMP 4.5 is a high-level programming model that enables the development of architecture- and accelerator-independent applications. This article describes aspects of the OpenMP implementation in the IBM XL C/C++ and XL Fortran OpenMP compilers that aid programmers to achieve performance objectives. This includes an interprocedural static analysis the XL optimizer uses to specialize code generation of the OpenMP distribute parallel do