Frontiers in High Performance Computing最新文献

英文中文

Runtime support for CPU-GPU high-performance computing on distributed memory platforms 为分布式内存平台上的 CPU-GPU 高性能计算提供运行时支持

Frontiers in High Performance Computing

Pub Date : 2024-07-19 DOI: 10.3389/fhpcp.2024.1417040

Polykarpos Thomadakis, Nikos Chrisochoides

Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware. This shift in the computing ecosystem offers many opportunities for performance improvement; however, it also increases the complexity of programming for such architectures.This work introduces a runtime framework that enables effortless programming for heterogeneous systems while efficiently utilizing hardware resources. The framework is integrated within a distributed and scalable runtime system to facilitate performance portability across heterogeneous nodes. Along with the design, this paper describes the implementation and optimizations performed, achieving up to 300% improvement on a single device and linear scalability on a node equipped with four GPUs.The framework in a distributed memory environment offers portable abstractions that enable efficient inter-node communication among devices with varying capabilities. It delivers superior performance compared to MPI+CUDA by up to 20% for large messages while keeping the overheads for small messages within 10%. Furthermore, the results of our performance evaluation in a distributed Jacobi proxy application demonstrate that our software imposes minimal overhead and achieves a performance improvement of up to 40%.This is accomplished by the optimizations at the library level and by creating opportunities to leverage application-specific optimizations like over-decomposition.

硬件异构化是高性能计算的发展趋势。目前，大规模系统的每个计算节点都配备了多个 GPU 加速器，预计未来还会集成更多专用硬件。计算生态系统的这一转变为性能提升提供了许多机会，但同时也增加了为此类架构编程的复杂性。这项工作引入了一个运行时框架，可在高效利用硬件资源的同时为异构系统轻松编程。该框架集成在一个分布式、可扩展的运行时系统中，以促进异构节点间的性能移植。除了设计，本文还介绍了实现和优化过程，在单个设备上实现了高达 300% 的性能提升，在配备四个 GPU 的节点上实现了线性可扩展性。与MPI+CUDA相比，该框架在处理大型信息时性能提高了20%，而处理小型信息时的开销则保持在10%以内。此外，我们在一个分布式雅可比代理应用中进行的性能评估结果表明，我们的软件开销极小，性能却提高了 40%。

{"title":"Runtime support for CPU-GPU high-performance computing on distributed memory platforms","authors":"Polykarpos Thomadakis, Nikos Chrisochoides","doi":"10.3389/fhpcp.2024.1417040","DOIUrl":"https://doi.org/10.3389/fhpcp.2024.1417040","url":null,"abstract":"Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware. This shift in the computing ecosystem offers many opportunities for performance improvement; however, it also increases the complexity of programming for such architectures.This work introduces a runtime framework that enables effortless programming for heterogeneous systems while efficiently utilizing hardware resources. The framework is integrated within a distributed and scalable runtime system to facilitate performance portability across heterogeneous nodes. Along with the design, this paper describes the implementation and optimizations performed, achieving up to 300% improvement on a single device and linear scalability on a node equipped with four GPUs.The framework in a distributed memory environment offers portable abstractions that enable efficient inter-node communication among devices with varying capabilities. It delivers superior performance compared to MPI+CUDA by up to 20% for large messages while keeping the overheads for small messages within 10%. Furthermore, the results of our performance evaluation in a distributed Jacobi proxy application demonstrate that our software imposes minimal overhead and achieves a performance improvement of up to 40%.This is accomplished by the optimizations at the library level and by creating opportunities to leverage application-specific optimizations like over-decomposition.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":" 923","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141823137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using open-science workflow tools to produce SCEC CyberShake physics-based probabilistic seismic hazard models 利用开放科学工作流程工具制作 SCEC CyberShake 物理概率地震灾害模型

Frontiers in High Performance Computing

Pub Date : 2024-05-01 DOI: 10.3389/fhpcp.2024.1360720

S. Callaghan, P. Maechling, F. Silva, M. Su, K. Milner, Robert W. Graves, Kim B. Olsen, Yifeng Cui, K. Vahi, Albert Kottke, Christine A. Goulet, E. Deelman, Thomas H. Jordan, Y. Ben‐Zion

The Statewide (formerly Southern) California Earthquake Center (SCEC) conducts multidisciplinary earthquake system science research that aims to develop predictive models of earthquake processes, and to produce accurate seismic hazard information that can improve societal preparedness and resiliency to earthquake hazards. As part of this program, SCEC has developed the CyberShake platform, which calculates physics-based probabilistic seismic hazard analysis (PSHA) models for regions with high-quality seismic velocity and fault models. The CyberShake platform implements a sophisticated computational workflow that includes over 15 individual codes written by 6 developers. These codes are heterogeneous, ranging from short-running high-throughput serial CPU codes to large, long-running, parallel GPU codes. Additionally, CyberShake simulation campaigns are computationally extensive, typically producing tens of terabytes of meaningful scientific data and metadata over several months of around-the-clock execution on leadership-class supercomputers. To meet the needs of the CyberShake platform, we have developed an extreme-scale workflow stack, including the Pegasus Workflow Management System, HTCondor, Globus, and custom tools. We present this workflow software stack and identify how the CyberShake platform and supporting tools enable us to meet a variety of challenges that come with large-scale simulations, such as automated remote job submission, data management, and verification and validation. This platform enabled us to perform our most recent simulation campaign, CyberShake Study 22.12, from December 2022 to April 2023. During this time, our workflow tools executed approximately 32,000 jobs, and used up to 73% of the Summit system at Oak Ridge Leadership Computing Facility. Our workflow tools managed about 2.5 PB of total temporary and output data, and automatically staged 19 million output files totaling 74 TB back to archival storage on the University of Southern California's Center for Advanced Research Computing systems, including file-based relational data and large binary files to efficiently store millions of simulated seismograms. CyberShake extreme-scale workflows have generated simulation-based probabilistic seismic hazard models that are being used by seismological, engineering, and governmental communities.

全州（原南加州）地震中心（SCEC）开展多学科地震系统科学研究，旨在开发地震过程的预测模型，并提供准确的地震灾害信息，以提高社会对地震灾害的防备和抵御能力。作为该计划的一部分，SCEC 开发了 CyberShake 平台，该平台可为具有高质量地震速度和断层模型的地区计算基于物理学的概率地震灾害分析 (PSHA) 模型。CyberShake 平台实施了一套复杂的计算工作流程，其中包括由 6 名开发人员编写的 15 个以上的独立代码。这些代码是异构的，既有短时间运行的高吞吐量串行 CPU 代码，也有长时间运行的大型并行 GPU 代码。此外，CyberShake 模拟活动的计算量非常大，通常要在领先的超级计算机上全天候执行几个月，才能产生数十 TB 的有意义的科学数据和元数据。为了满足CyberShake平台的需求，我们开发了一个极端规模的工作流堆栈，包括Pegasus工作流管理系统、HTCondor、Globus和定制工具。我们介绍了这一工作流软件栈，并明确了CyberShake平台和支持工具如何使我们能够应对大规模模拟所带来的各种挑战，如自动远程作业提交、数据管理以及验证和确认。该平台使我们能够在 2022 年 12 月至 2023 年 4 月期间执行最新的模拟活动，即 CyberShake 研究 22.12。在此期间，我们的工作流工具执行了约 32,000 项工作，使用了橡树岭领先计算设施高达 73% 的 Summit 系统。我们的工作流工具管理着总计约 2.5 PB 的临时数据和输出数据，并自动将 1900 万个输出文件（总计 74 TB）存储到南加州大学高级研究计算中心系统的档案存储中，其中包括基于文件的关系数据和大型二进制文件，以有效存储数百万个模拟地震图。CyberShake 极端规模工作流程生成了基于模拟的概率地震灾害模型，地震学、工程学和政府部门都在使用这些模型。

{"title":"Using open-science workflow tools to produce SCEC CyberShake physics-based probabilistic seismic hazard models","authors":"S. Callaghan, P. Maechling, F. Silva, M. Su, K. Milner, Robert W. Graves, Kim B. Olsen, Yifeng Cui, K. Vahi, Albert Kottke, Christine A. Goulet, E. Deelman, Thomas H. Jordan, Y. Ben‐Zion","doi":"10.3389/fhpcp.2024.1360720","DOIUrl":"https://doi.org/10.3389/fhpcp.2024.1360720","url":null,"abstract":"The Statewide (formerly Southern) California Earthquake Center (SCEC) conducts multidisciplinary earthquake system science research that aims to develop predictive models of earthquake processes, and to produce accurate seismic hazard information that can improve societal preparedness and resiliency to earthquake hazards. As part of this program, SCEC has developed the CyberShake platform, which calculates physics-based probabilistic seismic hazard analysis (PSHA) models for regions with high-quality seismic velocity and fault models. The CyberShake platform implements a sophisticated computational workflow that includes over 15 individual codes written by 6 developers. These codes are heterogeneous, ranging from short-running high-throughput serial CPU codes to large, long-running, parallel GPU codes. Additionally, CyberShake simulation campaigns are computationally extensive, typically producing tens of terabytes of meaningful scientific data and metadata over several months of around-the-clock execution on leadership-class supercomputers. To meet the needs of the CyberShake platform, we have developed an extreme-scale workflow stack, including the Pegasus Workflow Management System, HTCondor, Globus, and custom tools. We present this workflow software stack and identify how the CyberShake platform and supporting tools enable us to meet a variety of challenges that come with large-scale simulations, such as automated remote job submission, data management, and verification and validation. This platform enabled us to perform our most recent simulation campaign, CyberShake Study 22.12, from December 2022 to April 2023. During this time, our workflow tools executed approximately 32,000 jobs, and used up to 73% of the Summit system at Oak Ridge Leadership Computing Facility. Our workflow tools managed about 2.5 PB of total temporary and output data, and automatically staged 19 million output files totaling 74 TB back to archival storage on the University of Southern California's Center for Advanced Research Computing systems, including file-based relational data and large binary files to efficiently store millions of simulated seismograms. CyberShake extreme-scale workflows have generated simulation-based probabilistic seismic hazard models that are being used by seismological, engineering, and governmental communities.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"42 14","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141030381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0 速度与容量：使用 GPUexplore 3.0 进行内存效率高的多 GPU 加速显式状态空间探索

Frontiers in High Performance Computing

Pub Date : 2024-03-13 DOI: 10.3389/fhpcp.2024.1285349

Anton Wijs, Muhammad Osama

The GPU acceleration of explicit state space exploration, for explicit-state model checking, has been the subject of previous research, but to date, the tools have been limited in their applicability and in their practical use. Considering this research, to our knowledge, we are the first to use a novel tree database for GPUs. This novel tree database allows high-performant, memory-efficient storage of states in the form of binary trees. Besides the tree compression this enables, we also propose two new hashing schemes, compact-cuckoo and compact multiple-functions. These schemes enable the use of Cleary compression to compactly store tree roots. Besides an in-depth discussion of the tree database algorithms, the input language and workflow of our tool, called GPUexplore 3.0, are presented. Finally, we explain how the algorithms can be extended to exploit multiple GPUs that reside on the same machine. Experiments show single-GPU processing speeds of up to 144 million states per second compared to 20 million states achieved by 32-core LTSmin. In the multi-GPU setting, workload and storage distributions are optimal, and, frequently, performance is even positively impacted when the number of GPUs is increased. Overall, a logarithmic acceleration up to 1.9× was achieved with four GPUs, compared to what was achieved with one and two GPUs. We believe that a linear speedup can be easily accomplished with faster P2P communications between the GPUs.

GPU 加速显式状态空间探索（用于显式状态模型检查）一直是以前研究的主题，但迄今为止，这些工具在适用性和实际使用方面都很有限。考虑到这一研究，据我们所知，我们是第一个在 GPU 上使用新型树数据库的人。这种新颖的树数据库允许以二叉树的形式高效存储状态。除了能实现树形压缩外，我们还提出了两种新的散列方案：紧凑型布谷鸟和紧凑型多重函数。这些方案可以使用克里压缩来紧凑地存储树根。除了对树数据库算法的深入讨论，我们还介绍了名为 GPUexplore 3.0 的工具的输入语言和工作流程。最后，我们解释了如何将算法扩展到利用同一台机器上的多个 GPU。实验显示，单 GPU 处理速度高达每秒 1.44 亿个状态，而 32 核 LTSmin 的处理速度仅为 2000 万个状态。在多 GPU 设置中，工作负载和存储分布达到最佳状态，而且当 GPU 数量增加时，性能甚至经常会受到积极影响。总体而言，与使用一个和两个 GPU 时相比，使用四个 GPU 可实现高达 1.9 倍的对数加速。我们相信，通过 GPU 之间更快的 P2P 通信，可以轻松实现线性加速。

{"title":"The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0","authors":"Anton Wijs, Muhammad Osama","doi":"10.3389/fhpcp.2024.1285349","DOIUrl":"https://doi.org/10.3389/fhpcp.2024.1285349","url":null,"abstract":"The GPU acceleration of explicit state space exploration, for explicit-state model checking, has been the subject of previous research, but to date, the tools have been limited in their applicability and in their practical use. Considering this research, to our knowledge, we are the first to use a novel tree database for GPUs. This novel tree database allows high-performant, memory-efficient storage of states in the form of binary trees. Besides the tree compression this enables, we also propose two new hashing schemes, compact-cuckoo and compact multiple-functions. These schemes enable the use of Cleary compression to compactly store tree roots. Besides an in-depth discussion of the tree database algorithms, the input language and workflow of our tool, called GPUexplore 3.0, are presented. Finally, we explain how the algorithms can be extended to exploit multiple GPUs that reside on the same machine. Experiments show single-GPU processing speeds of up to 144 million states per second compared to 20 million states achieved by 32-core LTSmin. In the multi-GPU setting, workload and storage distributions are optimal, and, frequently, performance is even positively impacted when the number of GPUs is increased. Overall, a logarithmic acceleration up to 1.9× was achieved with four GPUs, compared to what was achieved with one and two GPUs. We believe that a linear speedup can be easily accomplished with faster P2P communications between the GPUs.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"137 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140247278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads? Asgard: NoSQL数据库适用于无服务器工作负载中的临时数据吗?

Frontiers in High Performance Computing

Pub Date : 2023-09-04 DOI: 10.3389/fhpcp.2023.1127883

Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji

Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.

无服务器计算平台由于其低管理开销和细粒度计费策略，在数据分析应用程序中越来越受欢迎。这种分析框架使用有向无环图(DAG)结构，其中无服务器功能(细粒度任务)表示为节点，功能之间的数据依赖关系表示为边。将中间(短暂的)数据从一个函数传递到另一个函数最近受到了人们的关注，人们提出了各种存储系统和优化方法。实践状态的方法是通过远程存储传递临时数据，要么基于磁盘(例如，Amazon S3)，这是缓慢的，要么基于内存(例如，ElastiCache Redis)，这是昂贵的。尽管一些突出的NoSQL数据库(如Apache Cassandra和ScyllaDB)具有利用内存和磁盘的潜力，但普遍的观点认为它们不适合临时数据，更适合长期存储。在我们名为《阿斯加德》的研究中，我们严格检验了这一假设。我们使用Amazon Web Services (AWS)作为两个流行的无服务器应用程序的测试平台，探讨了fanout和不同工作负载等场景，以dag感知方式配置NoSQL数据库的性能优势。令人惊讶的是，我们发现，按$ cost标准化的端到端延迟，Apache Cassandra的默认设置比Redis高326%，比S3高189%。当使用Asgard进行优化时，Cassandra比自己的默认配置高出47%。这强调了NoSQL数据库可以超越当前实践状态的特定实例。

{"title":"Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads?","authors":"Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji","doi":"10.3389/fhpcp.2023.1127883","DOIUrl":"https://doi.org/10.3389/fhpcp.2023.1127883","url":null,"abstract":"Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131161215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SNDVI: a new scalable serverless framework to compute NDVI SNDVI:一个新的可扩展的无服务器框架来计算NDVI

Frontiers in High Performance Computing

Pub Date : 2023-08-25 DOI: 10.3389/fhpcp.2023.1151530

Lucas Iacono, David Pacios, J. L. Vázquez-Poletti

Farmers and agronomists require crop health metrics to monitor plantations and detect problems like diseases or droughts at an early stage. This enables them to implement measures to address crop problems. The use of multispectral images and cloud computing is conducive to obtaining such metrics. Drones and satellites capture extensive multispectral image datasets, while the cloud facilitates the storage of these images and provides execution services for extracting crop health metrics, such as the Normalized Difference Vegetation Index (NDVI). The use of the Cloud to compute NDVI poses new research challenges, such as determining which cloud technology offers the optimal balance of execution time and monetary cost. In this article, we present Serverless NDVI (SNDVI), a new framework based on serverless computing for NDVI computation. The objective of SNDVI is to minimize the monetary costs and computing times associated with using a Public Cloud while processing NDVI from large datasets. One of SNDVI's key contributions is to crop the dataset into subsegments to leverage Lambda's ability to run up to 1,000 NDVI computing functions in parallel on each subsegment. We deployed SNDVI using Amazon Lambda and conducted two experiments to analyze and validate its performance. Both experiments focused on two key metrics: (i) execution time and (ii) monetary costs. The first experiment involved executing SNDVI to extract NDVI from a multispectral dataset. The objective was to evaluate the overall SNDVI functionality, assess its performance, and verify the quality of SNDVI output. In the second experiment, we conducted a benchmarking analysis comparing SNDVI with an EC2-based NDVI computing architecture. Results from the first experiment demonstrated that the processing times for the entire SNDVI execution ranged from 9 to 15 seconds, with a total cost (including storage) of 4.19 USD. Results from the second experiment revealed that the monetary costs of EC2 and Lambda were similar, but the computing time for SNDVI was 411 times faster than the EC2 architecture. In conclusion, the investigation reported in this paper demonstrates that SNDVI successfully achieves its goals and that Serverless Computing presents a promising native serverless alternative to traditional cloud services for NDVI computation.

农民和农学家需要作物健康指标来监测种植园，并在早期发现疾病或干旱等问题。这使他们能够采取措施解决作物问题。使用多光谱图像和云计算有助于获得这些度量。无人机和卫星捕获了大量的多光谱图像数据集，而云则促进了这些图像的存储，并提供了提取作物健康指标的执行服务，如归一化植被指数(NDVI)。使用云计算NDVI带来了新的研究挑战，例如确定哪种云技术可以提供执行时间和货币成本的最佳平衡。本文提出了一种新的基于无服务器计算的NDVI计算框架——无服务器NDVI (SNDVI)。SNDVI的目标是在处理大型数据集的NDVI时，最大限度地减少与使用公共云相关的货币成本和计算时间。SNDVI的关键贡献之一是将数据集裁剪为子段，以利用Lambda在每个子段上并行运行多达1,000个NDVI计算函数的能力。我们使用Amazon Lambda部署了SNDVI，并进行了两个实验来分析和验证其性能。这两个实验都关注两个关键指标:(i)执行时间和(ii)货币成本。第一个实验涉及执行SNDVI从多光谱数据集中提取NDVI。目的是评估SNDVI的整体功能，评估其性能，并验证SNDVI输出的质量。在第二个实验中，我们进行了基准测试分析，将SNDVI与基于ec2的NDVI计算架构进行了比较。第一个实验的结果表明，整个SNDVI执行的处理时间在9到15秒之间，总成本(包括存储)为4.19美元。第二个实验的结果显示，EC2和Lambda的货币成本相似，但SNDVI的计算时间比EC2架构快411倍。总之，本文报告的调查表明，SNDVI成功实现了其目标，无服务器计算为NDVI计算提供了一种有前途的原生无服务器替代方案。

{"title":"SNDVI: a new scalable serverless framework to compute NDVI","authors":"Lucas Iacono, David Pacios, J. L. Vázquez-Poletti","doi":"10.3389/fhpcp.2023.1151530","DOIUrl":"https://doi.org/10.3389/fhpcp.2023.1151530","url":null,"abstract":"Farmers and agronomists require crop health metrics to monitor plantations and detect problems like diseases or droughts at an early stage. This enables them to implement measures to address crop problems. The use of multispectral images and cloud computing is conducive to obtaining such metrics. Drones and satellites capture extensive multispectral image datasets, while the cloud facilitates the storage of these images and provides execution services for extracting crop health metrics, such as the Normalized Difference Vegetation Index (NDVI). The use of the Cloud to compute NDVI poses new research challenges, such as determining which cloud technology offers the optimal balance of execution time and monetary cost. In this article, we present Serverless NDVI (SNDVI), a new framework based on serverless computing for NDVI computation. The objective of SNDVI is to minimize the monetary costs and computing times associated with using a Public Cloud while processing NDVI from large datasets. One of SNDVI's key contributions is to crop the dataset into subsegments to leverage Lambda's ability to run up to 1,000 NDVI computing functions in parallel on each subsegment. We deployed SNDVI using Amazon Lambda and conducted two experiments to analyze and validate its performance. Both experiments focused on two key metrics: (i) execution time and (ii) monetary costs. The first experiment involved executing SNDVI to extract NDVI from a multispectral dataset. The objective was to evaluate the overall SNDVI functionality, assess its performance, and verify the quality of SNDVI output. In the second experiment, we conducted a benchmarking analysis comparing SNDVI with an EC2-based NDVI computing architecture. Results from the first experiment demonstrated that the processing times for the entire SNDVI execution ranged from 9 to 15 seconds, with a total cost (including storage) of 4.19 USD. Results from the second experiment revealed that the monetary costs of EC2 and Lambda were similar, but the computing time for SNDVI was 411 times faster than the EC2 architecture. In conclusion, the investigation reported in this paper demonstrates that SNDVI successfully achieves its goals and that Serverless Computing presents a promising native serverless alternative to traditional cloud services for NDVI computation.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"113 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124181548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Auto-scaling edge cloud for network slicing 自动缩放边缘云用于网络切片

Frontiers in High Performance Computing

Pub Date : 2023-06-09 DOI: 10.3389/fhpcp.2023.1167162

EmadelDin A. Mazied, Dimitrios S. Nikolopoulos, Y. Hanafy, S. Midkiff

This paper presents a study on resource control for autoscaling virtual radio access networks (RAN slices) in next-generation wireless networks. The dynamic instantiation and termination of on-demand RAN slices require efficient autoscaling of computational resources at the edge. Autoscaling involves vertical scaling (VS) and horizontal scaling (HS) to adapt resource allocation based on demand variations. However, the strict processing time requirements for RAN slices pose challenges when instantiating new containers. To address this issue, we propose removing resource limits from slice configuration and leveraging the decision-making capabilities of a centralized slicing controller. We introduce a resource control agent (RC) that determines resource limits as the number of computing resources packed into containers, aiming to minimize deployment costs while maintaining processing time below a threshold. The RAN slicing workload is modeled using the Low-Density Parity Check (LDPC) decoding algorithm, known for its stochastic demands. We formulate the problem as a variant of the stochastic bin packing problem (SBPP) to satisfy the random variations in radio workload. By employing chance-constrained programming, we approach the SBPP resource control (S-RC) problem. Our numerical evaluation demonstrates that S-RC maintains the processing time requirement with a higher probability compared to configuring RAN slices with predefined limits, although it introduces a 45% overall average cost overhead.

本文研究了下一代无线网络中自动缩放虚拟无线接入网(RAN切片)的资源控制问题。按需RAN切片的动态实例化和终止需要在边缘有效地自动缩放计算资源。自动缩放包括垂直缩放(VS)和水平缩放(HS)，以适应基于需求变化的资源分配。然而，RAN片严格的处理时间要求在实例化新容器时带来了挑战。为了解决这个问题，我们建议从切片配置中去除资源限制，并利用集中式切片控制器的决策能力。我们引入了一个资源控制代理(RC)，它将资源限制确定为打包到容器中的计算资源的数量，旨在最小化部署成本，同时将处理时间保持在阈值以下。RAN切片工作负载使用低密度奇偶校验(LDPC)解码算法建模，该算法以其随机需求而闻名。我们将该问题表述为随机装箱问题(SBPP)的一个变体，以满足无线电工作负荷的随机变化。利用机会约束规划方法，研究了SBPP资源控制问题。我们的数值评估表明，与配置具有预定义限制的RAN切片相比，S-RC保持处理时间需求的概率更高，尽管它引入了45%的总体平均成本开销。

{"title":"Auto-scaling edge cloud for network slicing","authors":"EmadelDin A. Mazied, Dimitrios S. Nikolopoulos, Y. Hanafy, S. Midkiff","doi":"10.3389/fhpcp.2023.1167162","DOIUrl":"https://doi.org/10.3389/fhpcp.2023.1167162","url":null,"abstract":"This paper presents a study on resource control for autoscaling virtual radio access networks (RAN slices) in next-generation wireless networks. The dynamic instantiation and termination of on-demand RAN slices require efficient autoscaling of computational resources at the edge. Autoscaling involves vertical scaling (VS) and horizontal scaling (HS) to adapt resource allocation based on demand variations. However, the strict processing time requirements for RAN slices pose challenges when instantiating new containers. To address this issue, we propose removing resource limits from slice configuration and leveraging the decision-making capabilities of a centralized slicing controller. We introduce a resource control agent (RC) that determines resource limits as the number of computing resources packed into containers, aiming to minimize deployment costs while maintaining processing time below a threshold. The RAN slicing workload is modeled using the Low-Density Parity Check (LDPC) decoding algorithm, known for its stochastic demands. We formulate the problem as a variant of the stochastic bin packing problem (SBPP) to satisfy the random variations in radio workload. By employing chance-constrained programming, we approach the SBPP resource control (S-RC) problem. Our numerical evaluation demonstrates that S-RC maintains the processing time requirement with a higher probability compared to configuring RAN slices with predefined limits, although it introduces a 45% overall average cost overhead.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129802851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Frontiers in High Performance Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀