Pub Date : 2024-07-19DOI: 10.3389/fhpcp.2024.1417040
Polykarpos Thomadakis, Nikos Chrisochoides
Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware. This shift in the computing ecosystem offers many opportunities for performance improvement; however, it also increases the complexity of programming for such architectures.This work introduces a runtime framework that enables effortless programming for heterogeneous systems while efficiently utilizing hardware resources. The framework is integrated within a distributed and scalable runtime system to facilitate performance portability across heterogeneous nodes. Along with the design, this paper describes the implementation and optimizations performed, achieving up to 300% improvement on a single device and linear scalability on a node equipped with four GPUs.The framework in a distributed memory environment offers portable abstractions that enable efficient inter-node communication among devices with varying capabilities. It delivers superior performance compared to MPI+CUDA by up to 20% for large messages while keeping the overheads for small messages within 10%. Furthermore, the results of our performance evaluation in a distributed Jacobi proxy application demonstrate that our software imposes minimal overhead and achieves a performance improvement of up to 40%.This is accomplished by the optimizations at the library level and by creating opportunities to leverage application-specific optimizations like over-decomposition.
{"title":"Runtime support for CPU-GPU high-performance computing on distributed memory platforms","authors":"Polykarpos Thomadakis, Nikos Chrisochoides","doi":"10.3389/fhpcp.2024.1417040","DOIUrl":"https://doi.org/10.3389/fhpcp.2024.1417040","url":null,"abstract":"Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware. This shift in the computing ecosystem offers many opportunities for performance improvement; however, it also increases the complexity of programming for such architectures.This work introduces a runtime framework that enables effortless programming for heterogeneous systems while efficiently utilizing hardware resources. The framework is integrated within a distributed and scalable runtime system to facilitate performance portability across heterogeneous nodes. Along with the design, this paper describes the implementation and optimizations performed, achieving up to 300% improvement on a single device and linear scalability on a node equipped with four GPUs.The framework in a distributed memory environment offers portable abstractions that enable efficient inter-node communication among devices with varying capabilities. It delivers superior performance compared to MPI+CUDA by up to 20% for large messages while keeping the overheads for small messages within 10%. Furthermore, the results of our performance evaluation in a distributed Jacobi proxy application demonstrate that our software imposes minimal overhead and achieves a performance improvement of up to 40%.This is accomplished by the optimizations at the library level and by creating opportunities to leverage application-specific optimizations like over-decomposition.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":" 923","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141823137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.3389/fhpcp.2024.1360720
S. Callaghan, P. Maechling, F. Silva, M. Su, K. Milner, Robert W. Graves, Kim B. Olsen, Yifeng Cui, K. Vahi, Albert Kottke, Christine A. Goulet, E. Deelman, Thomas H. Jordan, Y. Ben‐Zion
The Statewide (formerly Southern) California Earthquake Center (SCEC) conducts multidisciplinary earthquake system science research that aims to develop predictive models of earthquake processes, and to produce accurate seismic hazard information that can improve societal preparedness and resiliency to earthquake hazards. As part of this program, SCEC has developed the CyberShake platform, which calculates physics-based probabilistic seismic hazard analysis (PSHA) models for regions with high-quality seismic velocity and fault models. The CyberShake platform implements a sophisticated computational workflow that includes over 15 individual codes written by 6 developers. These codes are heterogeneous, ranging from short-running high-throughput serial CPU codes to large, long-running, parallel GPU codes. Additionally, CyberShake simulation campaigns are computationally extensive, typically producing tens of terabytes of meaningful scientific data and metadata over several months of around-the-clock execution on leadership-class supercomputers. To meet the needs of the CyberShake platform, we have developed an extreme-scale workflow stack, including the Pegasus Workflow Management System, HTCondor, Globus, and custom tools. We present this workflow software stack and identify how the CyberShake platform and supporting tools enable us to meet a variety of challenges that come with large-scale simulations, such as automated remote job submission, data management, and verification and validation. This platform enabled us to perform our most recent simulation campaign, CyberShake Study 22.12, from December 2022 to April 2023. During this time, our workflow tools executed approximately 32,000 jobs, and used up to 73% of the Summit system at Oak Ridge Leadership Computing Facility. Our workflow tools managed about 2.5 PB of total temporary and output data, and automatically staged 19 million output files totaling 74 TB back to archival storage on the University of Southern California's Center for Advanced Research Computing systems, including file-based relational data and large binary files to efficiently store millions of simulated seismograms. CyberShake extreme-scale workflows have generated simulation-based probabilistic seismic hazard models that are being used by seismological, engineering, and governmental communities.
{"title":"Using open-science workflow tools to produce SCEC CyberShake physics-based probabilistic seismic hazard models","authors":"S. Callaghan, P. Maechling, F. Silva, M. Su, K. Milner, Robert W. Graves, Kim B. Olsen, Yifeng Cui, K. Vahi, Albert Kottke, Christine A. Goulet, E. Deelman, Thomas H. Jordan, Y. Ben‐Zion","doi":"10.3389/fhpcp.2024.1360720","DOIUrl":"https://doi.org/10.3389/fhpcp.2024.1360720","url":null,"abstract":"The Statewide (formerly Southern) California Earthquake Center (SCEC) conducts multidisciplinary earthquake system science research that aims to develop predictive models of earthquake processes, and to produce accurate seismic hazard information that can improve societal preparedness and resiliency to earthquake hazards. As part of this program, SCEC has developed the CyberShake platform, which calculates physics-based probabilistic seismic hazard analysis (PSHA) models for regions with high-quality seismic velocity and fault models. The CyberShake platform implements a sophisticated computational workflow that includes over 15 individual codes written by 6 developers. These codes are heterogeneous, ranging from short-running high-throughput serial CPU codes to large, long-running, parallel GPU codes. Additionally, CyberShake simulation campaigns are computationally extensive, typically producing tens of terabytes of meaningful scientific data and metadata over several months of around-the-clock execution on leadership-class supercomputers. To meet the needs of the CyberShake platform, we have developed an extreme-scale workflow stack, including the Pegasus Workflow Management System, HTCondor, Globus, and custom tools. We present this workflow software stack and identify how the CyberShake platform and supporting tools enable us to meet a variety of challenges that come with large-scale simulations, such as automated remote job submission, data management, and verification and validation. This platform enabled us to perform our most recent simulation campaign, CyberShake Study 22.12, from December 2022 to April 2023. During this time, our workflow tools executed approximately 32,000 jobs, and used up to 73% of the Summit system at Oak Ridge Leadership Computing Facility. Our workflow tools managed about 2.5 PB of total temporary and output data, and automatically staged 19 million output files totaling 74 TB back to archival storage on the University of Southern California's Center for Advanced Research Computing systems, including file-based relational data and large binary files to efficiently store millions of simulated seismograms. CyberShake extreme-scale workflows have generated simulation-based probabilistic seismic hazard models that are being used by seismological, engineering, and governmental communities.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"42 14","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141030381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-13DOI: 10.3389/fhpcp.2024.1285349
Anton Wijs, Muhammad Osama
The GPU acceleration of explicit state space exploration, for explicit-state model checking, has been the subject of previous research, but to date, the tools have been limited in their applicability and in their practical use. Considering this research, to our knowledge, we are the first to use a novel tree database for GPUs. This novel tree database allows high-performant, memory-efficient storage of states in the form of binary trees. Besides the tree compression this enables, we also propose two new hashing schemes, compact-cuckoo and compact multiple-functions. These schemes enable the use of Cleary compression to compactly store tree roots. Besides an in-depth discussion of the tree database algorithms, the input language and workflow of our tool, called GPUexplore 3.0, are presented. Finally, we explain how the algorithms can be extended to exploit multiple GPUs that reside on the same machine. Experiments show single-GPU processing speeds of up to 144 million states per second compared to 20 million states achieved by 32-core LTSmin. In the multi-GPU setting, workload and storage distributions are optimal, and, frequently, performance is even positively impacted when the number of GPUs is increased. Overall, a logarithmic acceleration up to 1.9× was achieved with four GPUs, compared to what was achieved with one and two GPUs. We believe that a linear speedup can be easily accomplished with faster P2P communications between the GPUs.
{"title":"The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0","authors":"Anton Wijs, Muhammad Osama","doi":"10.3389/fhpcp.2024.1285349","DOIUrl":"https://doi.org/10.3389/fhpcp.2024.1285349","url":null,"abstract":"The GPU acceleration of explicit state space exploration, for explicit-state model checking, has been the subject of previous research, but to date, the tools have been limited in their applicability and in their practical use. Considering this research, to our knowledge, we are the first to use a novel tree database for GPUs. This novel tree database allows high-performant, memory-efficient storage of states in the form of binary trees. Besides the tree compression this enables, we also propose two new hashing schemes, compact-cuckoo and compact multiple-functions. These schemes enable the use of Cleary compression to compactly store tree roots. Besides an in-depth discussion of the tree database algorithms, the input language and workflow of our tool, called GPUexplore 3.0, are presented. Finally, we explain how the algorithms can be extended to exploit multiple GPUs that reside on the same machine. Experiments show single-GPU processing speeds of up to 144 million states per second compared to 20 million states achieved by 32-core LTSmin. In the multi-GPU setting, workload and storage distributions are optimal, and, frequently, performance is even positively impacted when the number of GPUs is increased. Overall, a logarithmic acceleration up to 1.9× was achieved with four GPUs, compared to what was achieved with one and two GPUs. We believe that a linear speedup can be easily accomplished with faster P2P communications between the GPUs.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"137 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140247278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-04DOI: 10.3389/fhpcp.2023.1127883
Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji
Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.
无服务器计算平台由于其低管理开销和细粒度计费策略,在数据分析应用程序中越来越受欢迎。这种分析框架使用有向无环图(DAG)结构,其中无服务器功能(细粒度任务)表示为节点,功能之间的数据依赖关系表示为边。将中间(短暂的)数据从一个函数传递到另一个函数最近受到了人们的关注,人们提出了各种存储系统和优化方法。实践状态的方法是通过远程存储传递临时数据,要么基于磁盘(例如,Amazon S3),这是缓慢的,要么基于内存(例如,ElastiCache Redis),这是昂贵的。尽管一些突出的NoSQL数据库(如Apache Cassandra和ScyllaDB)具有利用内存和磁盘的潜力,但普遍的观点认为它们不适合临时数据,更适合长期存储。在我们名为《阿斯加德》的研究中,我们严格检验了这一假设。我们使用Amazon Web Services (AWS)作为两个流行的无服务器应用程序的测试平台,探讨了fanout和不同工作负载等场景,以dag感知方式配置NoSQL数据库的性能优势。令人惊讶的是,我们发现,按$ cost标准化的端到端延迟,Apache Cassandra的默认设置比Redis高326%,比S3高189%。当使用Asgard进行优化时,Cassandra比自己的默认配置高出47%。这强调了NoSQL数据库可以超越当前实践状态的特定实例。
{"title":"Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads?","authors":"Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji","doi":"10.3389/fhpcp.2023.1127883","DOIUrl":"https://doi.org/10.3389/fhpcp.2023.1127883","url":null,"abstract":"Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131161215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-25DOI: 10.3389/fhpcp.2023.1151530
Lucas Iacono, David Pacios, J. L. Vázquez-Poletti
Farmers and agronomists require crop health metrics to monitor plantations and detect problems like diseases or droughts at an early stage. This enables them to implement measures to address crop problems. The use of multispectral images and cloud computing is conducive to obtaining such metrics. Drones and satellites capture extensive multispectral image datasets, while the cloud facilitates the storage of these images and provides execution services for extracting crop health metrics, such as the Normalized Difference Vegetation Index (NDVI). The use of the Cloud to compute NDVI poses new research challenges, such as determining which cloud technology offers the optimal balance of execution time and monetary cost. In this article, we present Serverless NDVI (SNDVI), a new framework based on serverless computing for NDVI computation. The objective of SNDVI is to minimize the monetary costs and computing times associated with using a Public Cloud while processing NDVI from large datasets. One of SNDVI's key contributions is to crop the dataset into subsegments to leverage Lambda's ability to run up to 1,000 NDVI computing functions in parallel on each subsegment. We deployed SNDVI using Amazon Lambda and conducted two experiments to analyze and validate its performance. Both experiments focused on two key metrics: (i) execution time and (ii) monetary costs. The first experiment involved executing SNDVI to extract NDVI from a multispectral dataset. The objective was to evaluate the overall SNDVI functionality, assess its performance, and verify the quality of SNDVI output. In the second experiment, we conducted a benchmarking analysis comparing SNDVI with an EC2-based NDVI computing architecture. Results from the first experiment demonstrated that the processing times for the entire SNDVI execution ranged from 9 to 15 seconds, with a total cost (including storage) of 4.19 USD. Results from the second experiment revealed that the monetary costs of EC2 and Lambda were similar, but the computing time for SNDVI was 411 times faster than the EC2 architecture. In conclusion, the investigation reported in this paper demonstrates that SNDVI successfully achieves its goals and that Serverless Computing presents a promising native serverless alternative to traditional cloud services for NDVI computation.
{"title":"SNDVI: a new scalable serverless framework to compute NDVI","authors":"Lucas Iacono, David Pacios, J. L. Vázquez-Poletti","doi":"10.3389/fhpcp.2023.1151530","DOIUrl":"https://doi.org/10.3389/fhpcp.2023.1151530","url":null,"abstract":"Farmers and agronomists require crop health metrics to monitor plantations and detect problems like diseases or droughts at an early stage. This enables them to implement measures to address crop problems. The use of multispectral images and cloud computing is conducive to obtaining such metrics. Drones and satellites capture extensive multispectral image datasets, while the cloud facilitates the storage of these images and provides execution services for extracting crop health metrics, such as the Normalized Difference Vegetation Index (NDVI). The use of the Cloud to compute NDVI poses new research challenges, such as determining which cloud technology offers the optimal balance of execution time and monetary cost. In this article, we present Serverless NDVI (SNDVI), a new framework based on serverless computing for NDVI computation. The objective of SNDVI is to minimize the monetary costs and computing times associated with using a Public Cloud while processing NDVI from large datasets. One of SNDVI's key contributions is to crop the dataset into subsegments to leverage Lambda's ability to run up to 1,000 NDVI computing functions in parallel on each subsegment. We deployed SNDVI using Amazon Lambda and conducted two experiments to analyze and validate its performance. Both experiments focused on two key metrics: (i) execution time and (ii) monetary costs. The first experiment involved executing SNDVI to extract NDVI from a multispectral dataset. The objective was to evaluate the overall SNDVI functionality, assess its performance, and verify the quality of SNDVI output. In the second experiment, we conducted a benchmarking analysis comparing SNDVI with an EC2-based NDVI computing architecture. Results from the first experiment demonstrated that the processing times for the entire SNDVI execution ranged from 9 to 15 seconds, with a total cost (including storage) of 4.19 USD. Results from the second experiment revealed that the monetary costs of EC2 and Lambda were similar, but the computing time for SNDVI was 411 times faster than the EC2 architecture. In conclusion, the investigation reported in this paper demonstrates that SNDVI successfully achieves its goals and that Serverless Computing presents a promising native serverless alternative to traditional cloud services for NDVI computation.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"113 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124181548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-09DOI: 10.3389/fhpcp.2023.1167162
EmadelDin A. Mazied, Dimitrios S. Nikolopoulos, Y. Hanafy, S. Midkiff
This paper presents a study on resource control for autoscaling virtual radio access networks (RAN slices) in next-generation wireless networks. The dynamic instantiation and termination of on-demand RAN slices require efficient autoscaling of computational resources at the edge. Autoscaling involves vertical scaling (VS) and horizontal scaling (HS) to adapt resource allocation based on demand variations. However, the strict processing time requirements for RAN slices pose challenges when instantiating new containers. To address this issue, we propose removing resource limits from slice configuration and leveraging the decision-making capabilities of a centralized slicing controller. We introduce a resource control agent (RC) that determines resource limits as the number of computing resources packed into containers, aiming to minimize deployment costs while maintaining processing time below a threshold. The RAN slicing workload is modeled using the Low-Density Parity Check (LDPC) decoding algorithm, known for its stochastic demands. We formulate the problem as a variant of the stochastic bin packing problem (SBPP) to satisfy the random variations in radio workload. By employing chance-constrained programming, we approach the SBPP resource control (S-RC) problem. Our numerical evaluation demonstrates that S-RC maintains the processing time requirement with a higher probability compared to configuring RAN slices with predefined limits, although it introduces a 45% overall average cost overhead.
{"title":"Auto-scaling edge cloud for network slicing","authors":"EmadelDin A. Mazied, Dimitrios S. Nikolopoulos, Y. Hanafy, S. Midkiff","doi":"10.3389/fhpcp.2023.1167162","DOIUrl":"https://doi.org/10.3389/fhpcp.2023.1167162","url":null,"abstract":"This paper presents a study on resource control for autoscaling virtual radio access networks (RAN slices) in next-generation wireless networks. The dynamic instantiation and termination of on-demand RAN slices require efficient autoscaling of computational resources at the edge. Autoscaling involves vertical scaling (VS) and horizontal scaling (HS) to adapt resource allocation based on demand variations. However, the strict processing time requirements for RAN slices pose challenges when instantiating new containers. To address this issue, we propose removing resource limits from slice configuration and leveraging the decision-making capabilities of a centralized slicing controller. We introduce a resource control agent (RC) that determines resource limits as the number of computing resources packed into containers, aiming to minimize deployment costs while maintaining processing time below a threshold. The RAN slicing workload is modeled using the Low-Density Parity Check (LDPC) decoding algorithm, known for its stochastic demands. We formulate the problem as a variant of the stochastic bin packing problem (SBPP) to satisfy the random variations in radio workload. By employing chance-constrained programming, we approach the SBPP resource control (S-RC) problem. Our numerical evaluation demonstrates that S-RC maintains the processing time requirement with a higher probability compared to configuring RAN slices with predefined limits, although it introduces a 45% overall average cost overhead.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129802851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}