Pub Date : 2023-06-27DOI: 10.48550/arXiv.2306.15562
Gonzalo Gómez-Sánchez, A. Call, Xavier Teruel, Lorena Alonso, Ignasi Morán, Miguel Angel Perez, D. Torrents, J. L. Berral
The use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations.
{"title":"Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads","authors":"Gonzalo Gómez-Sánchez, A. Call, Xavier Teruel, Lorena Alonso, Ignasi Morán, Miguel Angel Perez, D. Torrents, J. L. Berral","doi":"10.48550/arXiv.2306.15562","DOIUrl":"https://doi.org/10.48550/arXiv.2306.15562","url":null,"abstract":"The use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116139531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.48550/arXiv.2306.01797
F. Mantovani, Pablo Vizcaino, Fabio Banchelli, M. Garcia-Gasulla, R. Ferrer, Giorgos Ieronymakis, Nikos Dimou, Vassilis D. Papaefstathiou, Jesús Labarta
Prototyping HPC systems with low-to-mid technology readiness level (TRL) systems is critical for providing feedback to hardware designers, the system software team (e.g., compiler developers), and early adopters from the scientific community. The typical approach to hardware design and HPC system prototyping often limits feedback or only allows it at a late stage. In this paper, we present a set of tools for co-designing HPC systems, called software development vehicles (SDV). We use an innovative RISC-V design as a demonstrator, which includes a scalar CPU and a vector processing unit capable of operating large vectors up to 16 kbits. We provide an incremental methodology and early tangible evidence of the co-design process that provide feedback to improve both architecture and system software at a very early stage of system development.
{"title":"Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study","authors":"F. Mantovani, Pablo Vizcaino, Fabio Banchelli, M. Garcia-Gasulla, R. Ferrer, Giorgos Ieronymakis, Nikos Dimou, Vassilis D. Papaefstathiou, Jesús Labarta","doi":"10.48550/arXiv.2306.01797","DOIUrl":"https://doi.org/10.48550/arXiv.2306.01797","url":null,"abstract":"Prototyping HPC systems with low-to-mid technology readiness level (TRL) systems is critical for providing feedback to hardware designers, the system software team (e.g., compiler developers), and early adopters from the scientific community. The typical approach to hardware design and HPC system prototyping often limits feedback or only allows it at a late stage. In this paper, we present a set of tools for co-designing HPC systems, called software development vehicles (SDV). We use an innovative RISC-V design as a demonstrator, which includes a scalar CPU and a vector processing unit capable of operating large vectors up to 16 kbits. We provide an incremental methodology and early tangible evidence of the co-design process that provide feedback to improve both architecture and system software at a very early stage of system development.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127746546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-20DOI: 10.48550/arXiv.2304.10324
Joseph K. L. Lee, Maurice Jamieson, Nick Brown
Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.
{"title":"Backporting RISC-V Vector assembly","authors":"Joseph K. L. Lee, Maurice Jamieson, Nick Brown","doi":"10.48550/arXiv.2304.10324","DOIUrl":"https://doi.org/10.48550/arXiv.2304.10324","url":null,"abstract":"Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116499860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-20DOI: 10.48550/arXiv.2304.10319
Joseph K. L. Lee, Maurice Jamieson, Nick Brown, Ricardo Jesus
Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 processor, is the only mass-produced and commercially available hardware supporting RVV. This paper surveys the current state of RISC-V vectorisation as of 2023, reporting the landscape of both the hardware and software ecosystem. Driving our discussion from experiences in setting up the Allwinner D1 as part of the EPCC RISC-V testbed, we report the results of benchmarking the Allwinner D1 using the RAJA Performance Suite, which demonstrated reasonable vectorisation speedup using vendor-provided compiler, as well as favourable performance compared to the StarFive VisionFive V2 with SiFive's U74 processor.
{"title":"Test-driving RISC-V Vector hardware for HPC","authors":"Joseph K. L. Lee, Maurice Jamieson, Nick Brown, Ricardo Jesus","doi":"10.48550/arXiv.2304.10319","DOIUrl":"https://doi.org/10.48550/arXiv.2304.10319","url":null,"abstract":"Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 processor, is the only mass-produced and commercially available hardware supporting RVV. This paper surveys the current state of RISC-V vectorisation as of 2023, reporting the landscape of both the hardware and software ecosystem. Driving our discussion from experiences in setting up the Allwinner D1 as part of the EPCC RISC-V testbed, we report the results of benchmarking the Allwinner D1 using the RAJA Performance Suite, which demonstrated reasonable vectorisation speedup using vendor-provided compiler, as well as favourable performance compared to the StarFive VisionFive V2 with SiFive's U74 processor.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116493164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-09DOI: 10.48550/arXiv.2304.04276
Yehonatan Fridman, G. Tamir, Gal Oren
Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs -- the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs -- were released to the market, with the oneAPI and NVHPC compilers for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the portability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the coverage for version 4.5 is nearly complete in both latest NVHPC and oneAPI tools. However, we observed a lack of support in versions 5.0, 5.1, and 5.2, which is particularly noticeable when using NVHPC. From the performance perspective, we found that the PVC1100 and A100 are relatively comparable on the LULESH benchmark. While the A100 is slightly better due to faster memory bandwidth, the PVC1100 reaches the next problem size (400^3) scalably due to the larger memory size.
在过去的十年中,计算能力的大部分增长都是通过加速多核架构的进步获得的,主要是以gpgpu的形式。虽然加速器在各种计算任务中实现了惊人的性能,但它们的使用需要对代码进行调整和转换。因此,作为科学计算应用程序中最常见的多线程标准,OpenMP从v4.0开始在主机(cpu)和加速器之间引入了卸载功能,并在随后的v4.5、v5.0、v5.1和最新的v5.2版本中增加了支持。最近,两款最先进的gpu——英特尔Ponte Vecchio Max 1100和NVIDIA A100 gpu——发布到市场上,相应的,有一个api和NVHPC编译器用于卸载。在这项工作中,我们展示了OpenMP卸载功能到这些设备的早期性能结果,同时特别分析了高级指令的可移植性(使用SOLLVE的OMPVV测试套件)和代表性科学迷你应用程序(LULESH基准)中硬件的可扩展性。我们的结果表明,在最新的NVHPC和oneAPI工具中,4.5版本的覆盖几乎完全。然而,我们观察到在5.0、5.1和5.2版本中缺乏支持,这在使用NVHPC时尤其明显。从性能的角度来看,我们发现PVC1100和A100在LULESH基准上是相对可比性的。虽然A100由于更快的内存带宽而略好,但PVC1100由于更大的内存大小而可扩展地达到下一个问题大小(400^3)。
{"title":"Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators","authors":"Yehonatan Fridman, G. Tamir, Gal Oren","doi":"10.48550/arXiv.2304.04276","DOIUrl":"https://doi.org/10.48550/arXiv.2304.04276","url":null,"abstract":"Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs -- the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs -- were released to the market, with the oneAPI and NVHPC compilers for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the portability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the coverage for version 4.5 is nearly complete in both latest NVHPC and oneAPI tools. However, we observed a lack of support in versions 5.0, 5.1, and 5.2, which is particularly noticeable when using NVHPC. From the performance perspective, we found that the PVC1100 and A100 are relatively comparable on the LULESH benchmark. While the A100 is slightly better due to faster memory bandwidth, the PVC1100 reaches the next problem size (400^3) scalably due to the larger memory size.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127897643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-03DOI: 10.48550/arXiv.2212.01698
R. Caspart, Sebastian Ziegler, Arvid Weyrauch, Holger Obermaier, Simon Raffeiner, Leonie Schuhmacher, J. Scholtyssek, D. Trofimova, M. Nolden, I. Reinartz, Fabian Isensee, Markus Goetz, C. Debus
. With the rise of artificial intelligence (AI) in recent years and the subsequent increase in complexity of the applied models, the growing demand in computational resources is starting to pose a signif-icant challenge. The need for higher compute power is being met with increasingly more potent accelerator hardware as well as the use of large and powerful compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems ulti-mately comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, awareness of energy efficiency plays an important role for AI model developers and hardware infrastructure operators likewise. The energy consumption of AI workloads depends both on the model implementation and the composition of the utilized hardware. Therefore, accurate measurements of the power draw of respective AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. Towards this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of heterogeneous compute nodes. Our results indicate that 1. contrary to common approaches, deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional ineffi-ciency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately – while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is the fact that the information on energy consumption is available to all users of the supercomputer and not just those with administrator rights, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.
{"title":"Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads","authors":"R. Caspart, Sebastian Ziegler, Arvid Weyrauch, Holger Obermaier, Simon Raffeiner, Leonie Schuhmacher, J. Scholtyssek, D. Trofimova, M. Nolden, I. Reinartz, Fabian Isensee, Markus Goetz, C. Debus","doi":"10.48550/arXiv.2212.01698","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01698","url":null,"abstract":". With the rise of artificial intelligence (AI) in recent years and the subsequent increase in complexity of the applied models, the growing demand in computational resources is starting to pose a signif-icant challenge. The need for higher compute power is being met with increasingly more potent accelerator hardware as well as the use of large and powerful compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems ulti-mately comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, awareness of energy efficiency plays an important role for AI model developers and hardware infrastructure operators likewise. The energy consumption of AI workloads depends both on the model implementation and the composition of the utilized hardware. Therefore, accurate measurements of the power draw of respective AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. Towards this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of heterogeneous compute nodes. Our results indicate that 1. contrary to common approaches, deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional ineffi-ciency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately – while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is the fact that the information on energy consumption is available to all users of the supercomputer and not just those with administrator rights, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125602079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-28DOI: 10.48550/arXiv.2206.14103
Nick Brown, R. Nash, G. Gibb, E. Belikov, Artur Podobas, W. Chien, S. Markidis, M. Flatken, A. Gerndt
. Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.
{"title":"Workflows to driving high-performance interactive supercomputing for urgent decision making","authors":"Nick Brown, R. Nash, G. Gibb, E. Belikov, Artur Podobas, W. Chien, S. Markidis, M. Flatken, A. Gerndt","doi":"10.48550/arXiv.2206.14103","DOIUrl":"https://doi.org/10.48550/arXiv.2206.14103","url":null,"abstract":". Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127585912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-13DOI: 10.1007/978-3-030-90539-2_17
Derssie Mebratu, N. Hasabnis, Pietro Mercati, Gaurit Sharma, S. Najnin
{"title":"Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms","authors":"Derssie Mebratu, N. Hasabnis, Pietro Mercati, Gaurit Sharma, S. Najnin","doi":"10.1007/978-3-030-90539-2_17","DOIUrl":"https://doi.org/10.1007/978-3-030-90539-2_17","url":null,"abstract":"","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128074663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1007/978-3-030-90539-2_16
Reed Milewicz, P. Pirkelbauer, Prema Soundararajan, H. Ahmed, A. Skjellum
{"title":"Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature Review","authors":"Reed Milewicz, P. Pirkelbauer, Prema Soundararajan, H. Ahmed, A. Skjellum","doi":"10.1007/978-3-030-90539-2_16","DOIUrl":"https://doi.org/10.1007/978-3-030-90539-2_16","url":null,"abstract":"","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129859667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-24DOI: 10.1007/978-3-030-90539-2_3
Mario Bedrunka, D. Wilde, Martin L. Kliemank, D. Reith, H. Foysi, Andreas Krämer
{"title":"Lettuce: PyTorch-based Lattice Boltzmann Framework","authors":"Mario Bedrunka, D. Wilde, Martin L. Kliemank, D. Reith, H. Foysi, Andreas Krämer","doi":"10.1007/978-3-030-90539-2_3","DOIUrl":"https://doi.org/10.1007/978-3-030-90539-2_3","url":null,"abstract":"","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115676863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}