首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
Springald: GPU-Accelerated Window-Based Aggregates Over Out-of-Order Data Streams Springald:GPU 加速的基于窗口的非顺序数据流聚合
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-22 DOI: 10.1109/TPDS.2024.3431611
Gabriele Mencagli;Patrizio Dazzi;Massimo Coppola
An increasing number of application domains require high-throughput processing to extract insights from massive data streams. The Data Stream Processing (DSP) paradigm provides formal approaches to analyze structured data streams considered as special, unbounded relations. The most used class of stateful operators in DSP are the ones running sliding-window aggregation, which continuously extracts insights from the most recent portion of the stream. This article presents Springald, an efficient sliding-window operator leveraging GPU devices. Springald, incorporated in the WindFlow parallel library, processes out-of-order data streams with watermarks propagation. These two features—GPU processing and out-of-orderliness—make Springald a novel contribution to this research area. This article describes the methodology behind Springald, its design and implementation. We also provide an extensive experimental evaluation to understand the behavior of Springald deeply, and we showcase its superior performance against state-of-the-art competitors.
越来越多的应用领域需要进行高吞吐量处理,以便从海量数据流中提取洞察力。数据流处理(Data Stream Processing,DSP)范式提供了分析结构化数据流的正规方法,这些数据流被视为特殊的无界关系。DSP 中最常用的一类有状态操作符是运行滑动窗口聚合的操作符,它能不断从数据流的最新部分中提取洞察力。本文介绍的 Springald 是一种利用 GPU 设备的高效滑动窗口运算器。Springald 集成在 WindFlow 并行库中,通过水印传播处理无序数据流。这两个特点--GPU 处理和失序--使 Springald 成为这一研究领域的新贡献。本文介绍了 Springald 背后的方法论、设计和实现。我们还提供了广泛的实验评估,以深入了解 Springald 的行为,并展示了它与最先进的竞争对手相比所具有的优越性能。
{"title":"Springald: GPU-Accelerated Window-Based Aggregates Over Out-of-Order Data Streams","authors":"Gabriele Mencagli;Patrizio Dazzi;Massimo Coppola","doi":"10.1109/TPDS.2024.3431611","DOIUrl":"10.1109/TPDS.2024.3431611","url":null,"abstract":"An increasing number of application domains require high-throughput processing to extract insights from massive data streams. The Data Stream Processing (DSP) paradigm provides formal approaches to analyze structured data streams considered as special, unbounded relations. The most used class of stateful operators in DSP are the ones running sliding-window aggregation, which continuously extracts insights from the most recent portion of the stream. This article presents \u0000<sc>Springald</small>\u0000, an efficient sliding-window operator leveraging GPU devices. \u0000<sc>Springald</small>\u0000, incorporated in the \u0000<sc>WindFlow</small>\u0000 parallel library, processes out-of-order data streams with watermarks propagation. These two features—GPU processing and out-of-orderliness—make \u0000<sc>Springald</small>\u0000 a novel contribution to this research area. This article describes the methodology behind \u0000<sc>Springald</small>\u0000, its design and implementation. We also provide an extensive experimental evaluation to understand the behavior of \u0000<sc>Springald</small>\u0000 deeply, and we showcase its superior performance against state-of-the-art competitors.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 9","pages":"1657-1671"},"PeriodicalIF":5.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10606093","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous Computing IRIS:跨平台异构计算的性能便携框架
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-19 DOI: 10.1109/TPDS.2024.3429010
Jungwon Kim;Seyong Lee;Beau Johnston;Jeffrey S. Vetter
From edge to exascale, computer architectures are becoming more heterogeneous and complex. The systems typically have fat nodes, with multicore CPUs and multiple hardware accelerators such as GPUs, FPGAs, and DSPs. This complexity is causing a crisis in programming systems and performance portability. Several programming systems are working to address these challenges, but the increasing architectural diversity is forcing software stacks and applications to be specialized for each architecture. As we show, all of these approaches critically depend on their software framework for discovery, execution, scheduling, and data orchestration. To address this challenge, we believe that a more agile and proactive software framework is essential to increase performance portability and improve user productivity. To this end, we have designed and implemented IRIS: a performance-portable framework for cross-platform heterogeneous computing. IRIS can discover available resources, manage multiple diverse programming platforms (e.g., CUDA, Hexagon, HIP, Level Zero, OpenCL, OpenMP) simultaneously in the same execution, respect data dependencies, orchestrate data movement proactively, and provide for user-configurable scheduling. To simplify data movement, IRIS introduces a shared virtual device memory with relaxed consistency among different heterogeneous devices. IRIS also adds an automatic kernel workload partitioning technique using the polyhedral model so that it can resize kernels for a wide range of devices. Our evaluation on three architectures, ranging from Qualcomm Snapdragon to a Summit supercomputer node, shows that IRIS improves portability across a wide range of diverse heterogeneous architectures with negligible overhead.
从边缘到超大规模,计算机架构正变得越来越异构和复杂。这些系统通常具有胖节点、多核 CPU 和多个硬件加速器(如 GPU、FPGA 和 DSP)。这种复杂性给编程系统和性能可移植性带来了危机。一些编程系统正在努力应对这些挑战,但日益增长的架构多样性正迫使软件栈和应用程序针对每种架构进行专门化。正如我们所展示的,所有这些方法在发现、执行、调度和数据协调方面都严重依赖于其软件框架。为了应对这一挑战,我们认为一个更加敏捷和主动的软件框架对于提高性能可移植性和用户生产率至关重要。为此,我们设计并实施了 IRIS:一个性能可移植的跨平台异构计算框架。IRIS 可以发现可用资源,在同一执行过程中同时管理多个不同的编程平台(如 CUDA、Hexagon、HIP、Level Zero、OpenCL、OpenMP),尊重数据依赖关系,主动协调数据移动,并提供用户可配置的调度。为了简化数据移动,IRIS 在不同异构设备之间引入了具有宽松一致性的共享虚拟设备内存。IRIS 还使用多面体模型添加了内核工作负载自动分区技术,从而可以为各种设备调整内核大小。我们在三种架构(从高通 Snapdragon 到 Summit 超级计算机节点)上进行的评估表明,IRIS 在各种不同的异构架构中提高了可移植性,其开销可忽略不计。
{"title":"IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous Computing","authors":"Jungwon Kim;Seyong Lee;Beau Johnston;Jeffrey S. Vetter","doi":"10.1109/TPDS.2024.3429010","DOIUrl":"10.1109/TPDS.2024.3429010","url":null,"abstract":"From edge to exascale, computer architectures are becoming more heterogeneous and complex. The systems typically have fat nodes, with multicore CPUs and multiple hardware accelerators such as GPUs, FPGAs, and DSPs. This complexity is causing a crisis in programming systems and performance portability. Several programming systems are working to address these challenges, but the increasing architectural diversity is forcing software stacks and applications to be specialized for each architecture. As we show, all of these approaches critically depend on their software framework for discovery, execution, scheduling, and data orchestration. To address this challenge, we believe that a more agile and proactive software framework is essential to increase performance portability and improve user productivity. To this end, we have designed and implemented IRIS: a performance-portable framework for cross-platform heterogeneous computing. IRIS can discover available resources, manage multiple diverse programming platforms (e.g., CUDA, Hexagon, HIP, Level Zero, OpenCL, OpenMP) simultaneously in the same execution, respect data dependencies, orchestrate data movement proactively, and provide for user-configurable scheduling. To simplify data movement, IRIS introduces a shared virtual device memory with relaxed consistency among different heterogeneous devices. IRIS also adds an automatic kernel workload partitioning technique using the polyhedral model so that it can resize kernels for a wide range of devices. Our evaluation on three architectures, ranging from Qualcomm Snapdragon to a Summit supercomputer node, shows that IRIS improves portability across a wide range of diverse heterogeneous architectures with negligible overhead.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1796-1809"},"PeriodicalIF":5.6,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ElasticBatch: A Learning-Augmented Elastic Scheduling System for Batch Inference on MIG ElasticBatch:用于 MIG 批量推理的学习增强型弹性调度系统
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-19 DOI: 10.1109/TPDS.2024.3431189
Jiaxing Qi;Wencong Xiao;Mingzhen Li;Chaojie Yang;Yong Li;Wei Lin;Hailong Yang;Zhongzhi Luan;Depei Qian
As deep learning (DL) technologies become ubiquitous, GPU clusters are deployed for inference tasks with consistent service level objectives (SLOs). Efficiently utilizing multiple GPUs is crucial for throughput and cost-effectiveness. This article addresses the challenges posed by dynamic input and NVIDIA MIG in scheduling DL workloads. We present ElasticBatch, a scheduling system that simplifies configuration through bucketization and employs a machine learning-based pipeline to optimize settings. Our experiments demonstrate that ElasticBatch achieves a 50% reduction in GPU instances compared to MIG disablement, increases GPU utilization by 1.4% to 6.5% over an ideal scheduler and significantly reduces profiling time. This research contributes to the discourse on efficient utilization of GPU clusters. ElasticBatch's effectiveness in mitigating challenges posed by dynamic inputs and NVIDIA MIG underscores its potential to optimize GPU cluster performance, providing tangible benefits in terms of reduced instances, increased utilization, and significant time savings in real-world deployment scenarios.
随着深度学习(DL)技术的普及,GPU 集群被部署用于具有一致服务水平目标(SLO)的推理任务。有效利用多个 GPU 对于提高吞吐量和成本效益至关重要。本文探讨了动态输入和英伟达™ MIG 在调度 DL 工作负载时带来的挑战。我们介绍了 ElasticBatch 调度系统,该系统通过桶化来简化配置,并采用基于机器学习的管道来优化设置。我们的实验证明,与禁用 MIG 相比,ElasticBatch 可减少 50% 的 GPU 实例,与理想的调度程序相比,GPU 利用率提高了 1.4% 至 6.5%,并显著减少了剖析时间。这项研究为高效利用 GPU 集群的讨论做出了贡献。ElasticBatch 在缓解动态输入和英伟达 MIG 带来的挑战方面的有效性凸显了其优化 GPU 群集性能的潜力,在实际部署场景中,它在减少实例、提高利用率和显著节省时间方面带来了实实在在的好处。
{"title":"ElasticBatch: A Learning-Augmented Elastic Scheduling System for Batch Inference on MIG","authors":"Jiaxing Qi;Wencong Xiao;Mingzhen Li;Chaojie Yang;Yong Li;Wei Lin;Hailong Yang;Zhongzhi Luan;Depei Qian","doi":"10.1109/TPDS.2024.3431189","DOIUrl":"10.1109/TPDS.2024.3431189","url":null,"abstract":"As deep learning (DL) technologies become ubiquitous, GPU clusters are deployed for inference tasks with consistent service level objectives (SLOs). Efficiently utilizing multiple GPUs is crucial for throughput and cost-effectiveness. This article addresses the challenges posed by dynamic input and NVIDIA MIG in scheduling DL workloads. We present ElasticBatch, a scheduling system that simplifies configuration through bucketization and employs a machine learning-based pipeline to optimize settings. Our experiments demonstrate that ElasticBatch achieves a 50% reduction in GPU instances compared to MIG disablement, increases GPU utilization by 1.4% to 6.5% over an ideal scheduler and significantly reduces profiling time. This research contributes to the discourse on efficient utilization of GPU clusters. ElasticBatch's effectiveness in mitigating challenges posed by dynamic inputs and NVIDIA MIG underscores its potential to optimize GPU cluster performance, providing tangible benefits in terms of reduced instances, increased utilization, and significant time savings in real-world deployment scenarios.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1708-1720"},"PeriodicalIF":5.6,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HybRAID: A High-Performance Hybrid RAID Storage Architecture for Write-Intensive Applications in All-Flash Storage Systems HybRAID:面向全闪存存储系统中写入密集型应用的高性能混合 RAID 存储架构
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-19 DOI: 10.1109/TPDS.2024.3429336
Maryam Karimi;Reza Salkhordeh;André Brinkmann;Hossein Asadi
With the ever-increasing demand for higher I/O performance and reliability in data-intensive applications, solid-state drives (SSDs) typically configured as redundant array of independent disks (RAID) are broadly used in enterprise all-flash storage systems. While a mirrored RAID offers higher performance in random access workloads, parity-based RAIDs (e.g., RAID5) provide higher performance in sequential accesses with less cost overhead. Previous studies try to address the poor performance of parity-based RAIDs in small writes (i.e., writes into a single disk) by offering various schemes, including caching or logging small writes. However, such techniques impose a significant performance and/or reliability overheads and are seldom used in the industry. In addition, our empirical analysis shows that partial stripe writes, i.e., writing into a fraction of a full array in parity-based RAIDs, can significantly degrade the I/O performance, which has not been addressed in the previous work. In this paper, we first offer an empirical study which reveals partial stripe writes reduce the performance of parity-based RAIDs by up to 6.85× compared to full stripe writes (i.e., writes into entire disks). Then, we propose a high-performance hybrid RAID storage architecture, called HybRAID, which is optimized for write-intensive applications. HybRAID exploits the advantages of mirror- and parity-based RAIDs to improve the write performance. HybRAID directs a) aligned full stripe writes to parity-based RAID tier and b) small/partial stripe writes to the RAID1 tier. We propose an online migration scheme, which aims to move small/partial writes from parity-based RAID to RAID1, based on access frequency of updates. As a complement, we further offer offline migration, whose aim is to make room in the fast tier for future references. Experimental results over enterprise SSDs show that HybRAID improves the performance of write-intensive applications by 3.3× and 2.6×, as well as enhancing performance per cost by 3.1× and 3.0× compared to parity-based RAID and RAID10, respectively, at equivalent costs.
随着数据密集型应用对更高 I/O 性能和可靠性的需求不断增长,通常配置为独立磁盘冗余阵列(RAID)的固态硬盘(SSD)被广泛应用于企业全闪存存储系统。镜像 RAID 可在随机存取工作负载中提供更高的性能,而基于奇偶校验的 RAID(如 RAID5)可在顺序访问中提供更高的性能,同时降低成本开销。以往的研究试图通过提供各种方案(包括缓存或记录小写入)来解决基于奇偶校验的 RAID 在小写入(即向单个磁盘写入)方面性能较差的问题。然而,这些技术会带来巨大的性能和/或可靠性开销,在业界很少使用。此外,我们的实证分析表明,部分磁条写入(即写入基于奇偶校验的 RAID 中完整磁盘阵列的一小部分)会显著降低 I/O 性能,而这在以前的工作中还没有得到解决。在本文中,我们首先进行了一项实证研究,结果表明部分磁条写入与全磁条写入(即写入整个磁盘)相比,会降低基于奇偶校验的 RAID 性能达 6.85 倍。然后,我们提出了一种名为 HybRAID 的高性能混合 RAID 存储架构,该架构针对写密集型应用进行了优化。HybRAID 利用基于镜像和奇偶校验的 RAID 的优势来提高写入性能。HybRAID 将 a) 对齐的全磁条写入引导到基于奇偶校验的 RAID 层,将 b) 小/部分磁条写入引导到 RAID1 层。我们提出了一种在线迁移方案,旨在根据更新的访问频率,将基于奇偶校验的 RAID 中的小规模/部分写入转移到 RAID1。作为补充,我们进一步提供了离线迁移,其目的是在快速层中腾出空间,以备将来参考。在企业固态硬盘上的实验结果表明,与基于奇偶校验的 RAID 和 RAID10 相比,在同等成本下,HybRAID 将写密集型应用的性能分别提高了 3.3 倍和 2.6 倍,单位成本性能分别提高了 3.1 倍和 3.0 倍。
{"title":"HybRAID: A High-Performance Hybrid RAID Storage Architecture for Write-Intensive Applications in All-Flash Storage Systems","authors":"Maryam Karimi;Reza Salkhordeh;André Brinkmann;Hossein Asadi","doi":"10.1109/TPDS.2024.3429336","DOIUrl":"10.1109/TPDS.2024.3429336","url":null,"abstract":"With the ever-increasing demand for higher I/O performance and reliability in data-intensive applications, \u0000<italic>solid-state drives</i>\u0000 (SSDs) typically configured as \u0000<italic>redundant array of independent disks</i>\u0000 (RAID) are broadly used in enterprise \u0000<italic>all-flash storage systems</i>\u0000. While a mirrored RAID offers higher performance in random access workloads, parity-based RAIDs (e.g., RAID5) provide higher performance in sequential accesses with less cost overhead. Previous studies try to address the poor performance of parity-based RAIDs in small writes (i.e., writes into a single disk) by offering various schemes, including caching or logging small writes. However, such techniques impose a significant performance and/or reliability overheads and are seldom used in the industry. In addition, our empirical analysis shows that partial stripe writes, i.e., writing into a fraction of a full array in parity-based RAIDs, can significantly degrade the I/O performance, which has \u0000<italic>not</i>\u0000 been addressed in the previous work. In this paper, we first offer an empirical study which reveals partial stripe writes reduce the performance of parity-based RAIDs by up to 6.85× compared to full stripe writes (i.e., writes into entire disks). Then, we propose a high-performance \u0000<underline>hyb</u>\u0000rid \u0000<underline>RAID</u>\u0000 storage architecture, called \u0000<italic>HybRAID</i>\u0000, which is optimized for write-intensive applications. HybRAID exploits the advantages of mirror- and parity-based RAIDs to improve the write performance. HybRAID directs a) \u0000<underline>aligned</u>\u0000 full stripe writes to parity-based RAID tier and b) small/partial stripe writes to the RAID1 tier. We propose an online migration scheme, which aims to move small/partial writes from parity-based RAID to RAID1, based on access frequency of updates. As a complement, we further offer offline migration, whose aim is to make room in the fast tier for future references. Experimental results over enterprise SSDs show that HybRAID improves the performance of write-intensive applications by 3.3× and 2.6×, as well as enhancing performance per cost by 3.1× and 3.0× compared to parity-based RAID and RAID10, respectively, at equivalent costs.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2608-2623"},"PeriodicalIF":5.6,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference With Spatio-Temporal Sharing InSS:利用时空共享实现多 GPU 推断的智能调度协调器
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-18 DOI: 10.1109/TPDS.2024.3430063
Ziyi Han;Ruiting Zhou;Chengzhong Xu;Yifan Zeng;Renli Zhang
As the applications of AI proliferate, it is critical to increase the throughput of online DNN inference services. Multi-process service (MPS) improves the utilization rate of GPU resources by spatial-sharing, but it also brings unique challenges. First, interference between co-located DNN models deployed on the same GPU must be accurately modeled. Second, inference tasks arrive dynamically online, and each task needs to be served within a bounded time to meet the service-level objective (SLO). Third, the problem of fragments has become more serious. To address the above three challenges, we propose an Intelligent Scheduling orchestrator for multi-GPU inference servers with spatio-temporal Sharing (InSS), aiming to maximize the system throughput. InSS exploits two key innovations: i) An interference-aware latency analytical model which estimates the task latency. ii) A two-stage intelligent scheduler is tailored to jointly optimize the model placement, GPU resource allocation and adaptively decides batch size by coupling the latency analytical model. Our prototype implementation on four NVIDIA A100 GPUs shows that InSS can improve the throughput by up to 86% compared to the state-of-the-art GPU schedulers, while satisfying SLOs. We further show the scalability of InSS on 64 GPUs.
随着人工智能应用的激增,提高在线 DNN 推断服务的吞吐量至关重要。多进程服务(MPS)通过空间共享提高了 GPU 资源的利用率,但也带来了独特的挑战。首先,必须对部署在同一 GPU 上的同位 DNN 模型之间的干扰进行精确建模。其次,推理任务是动态在线到达的,每个任务都需要在限定时间内完成,以满足服务级目标(SLO)。第三,碎片问题变得更加严重。针对上述三个挑战,我们提出了一种用于多 GPU 推断服务器的智能调度协调器(InSS),旨在最大限度地提高系统吞吐量。InSS利用了两个关键创新点:i) 一个干扰感知延迟分析模型,用于估算任务延迟;ii) 一个两阶段智能调度器,用于联合优化模型放置、GPU资源分配,并通过耦合延迟分析模型自适应地决定批量大小。我们在四台英伟达 A100 GPU 上的原型实施表明,与最先进的 GPU 调度器相比,InSS 可将吞吐量提高 86%,同时满足 SLO 要求。我们进一步展示了 InSS 在 64 个 GPU 上的可扩展性。
{"title":"InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference With Spatio-Temporal Sharing","authors":"Ziyi Han;Ruiting Zhou;Chengzhong Xu;Yifan Zeng;Renli Zhang","doi":"10.1109/TPDS.2024.3430063","DOIUrl":"10.1109/TPDS.2024.3430063","url":null,"abstract":"As the applications of AI proliferate, it is critical to increase the throughput of online DNN inference services. Multi-process service (MPS) improves the utilization rate of GPU resources by spatial-sharing, but it also brings unique challenges. First, interference between co-located DNN models deployed on the same GPU must be accurately modeled. Second, inference tasks arrive dynamically online, and each task needs to be served within a bounded time to meet the service-level objective (SLO). Third, the problem of fragments has become more serious. To address the above three challenges, we propose an \u0000<underline>In</u>\u0000telligent \u0000<underline>S</u>\u0000cheduling orchestrator for multi-GPU inference servers with spatio-temporal \u0000<underline>S</u>\u0000haring (\u0000<italic>InSS</i>\u0000), aiming to maximize the system throughput. \u0000<italic>InSS</i>\u0000 exploits two key innovations: i) An interference-aware latency analytical model which estimates the task latency. ii) A two-stage intelligent scheduler is tailored to jointly optimize the model placement, GPU resource allocation and adaptively decides batch size by coupling the latency analytical model. Our prototype implementation on four NVIDIA A100 GPUs shows that \u0000<italic>InSS</i>\u0000 can improve the throughput by up to 86% compared to the state-of-the-art GPU schedulers, while satisfying SLOs. We further show the scalability of \u0000<italic>InSS</i>\u0000 on 64 GPUs.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1735-1748"},"PeriodicalIF":5.6,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Swift: Expedited Failure Recovery for Large-Scale DNN Training SWIFT:大规模 DNN 训练的快速故障恢复
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-18 DOI: 10.1109/TPDS.2024.3429625
Yuchen Zhong;Guangming Sheng;Juncheng Liu;Jinhui Yuan;Chuan Wu
As the size of deep learning models gets larger and larger, training takes longer time and more resources, making fault tolerance more and more critical. Existing state-of-the-art methods like CheckFreq and Elastic Horovod need to back up a copy of the model state (i.e., parameters and optimizer states) in memory, which is costly for large models and leads to non-trivial overhead. This article presents Swift, a novel recovery design for distributed deep neural network training that significantly reduces the failure recovery overhead without affecting training throughput and model accuracy. Instead of making an additional copy of the model state, Swift resolves the inconsistencies of the model state caused by the failure and exploits the replicas of the model state in data parallelism for failure recovery. We propose a logging-based approach when replicas are unavailable, which records intermediate data and replays the computation to recover the lost state upon a failure. The re-computation is distributed across multiple machines to accelerate failure recovery further. We also log intermediate data selectively, exploring the trade-off between recovery time and intermediate data storage overhead. Evaluations show that Swift significantly reduces the failure recovery time and achieves similar or better training throughput during failure-free execution compared to state-of-the-art methods without degrading final model accuracy. Swift can also achieve up to 1.16x speedup in total training time compared to state-of-the-art methods.
随着深度学习模型的规模越来越大,训练需要更长的时间和更多的资源,容错变得越来越重要。现有的最先进方法(如 CheckFreq 和 Elastic Horovod)需要在内存中备份模型状态(即参数和优化器状态)的副本,这对于大型模型来说成本很高,而且会导致不小的开销。本文介绍的 Swift 是一种用于分布式深度神经网络训练的新型恢复设计,它能在不影响训练吞吐量和模型准确性的情况下显著降低故障恢复开销。Swift 不需要额外复制模型状态,而是解决故障导致的模型状态不一致问题,并利用数据并行性中的模型状态副本进行故障恢复。当副本不可用时,我们提出了一种基于日志的方法,该方法记录中间数据并重新计算,以便在故障发生时恢复丢失的状态。重新计算分布在多台机器上,以进一步加快故障恢复速度。我们还选择性地记录中间数据,探索恢复时间与中间数据存储开销之间的权衡。评估结果表明,与最先进的方法相比,Swift 显著缩短了故障恢复时间,并在无故障执行期间实现了类似或更好的训练吞吐量,同时不会降低最终模型的准确性。与最先进的方法相比,Swift 的总训练时间最多可加快 1.16 倍。
{"title":"Swift: Expedited Failure Recovery for Large-Scale DNN Training","authors":"Yuchen Zhong;Guangming Sheng;Juncheng Liu;Jinhui Yuan;Chuan Wu","doi":"10.1109/TPDS.2024.3429625","DOIUrl":"10.1109/TPDS.2024.3429625","url":null,"abstract":"As the size of deep learning models gets larger and larger, training takes longer time and more resources, making fault tolerance more and more critical. Existing state-of-the-art methods like CheckFreq and Elastic Horovod need to back up a copy of the model state (i.e., parameters and optimizer states) in memory, which is costly for large models and leads to non-trivial overhead. This article presents \u0000<sc>Swift</small>\u0000, a novel recovery design for distributed deep neural network training that significantly reduces the failure recovery overhead without affecting training throughput and model accuracy. Instead of making an additional copy of the model state, \u0000<sc>Swift</small>\u0000 resolves the inconsistencies of the model state caused by the failure and exploits the replicas of the model state in data parallelism for failure recovery. We propose a logging-based approach when replicas are unavailable, which records intermediate data and replays the computation to recover the lost state upon a failure. The re-computation is distributed across multiple machines to accelerate failure recovery further. We also log intermediate data selectively, exploring the trade-off between recovery time and intermediate data storage overhead. Evaluations show that \u0000<sc>Swift</small>\u0000 significantly reduces the failure recovery time and achieves similar or better training throughput during failure-free execution compared to state-of-the-art methods without degrading final model accuracy. \u0000<sc>Swift</small>\u0000 can also achieve up to 1.16x speedup in total training time compared to state-of-the-art methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 9","pages":"1644-1656"},"PeriodicalIF":5.6,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STT-RAM-Based Hierarchical in-Memory Computing 基于 STT-RAM 的分层内存计算
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-18 DOI: 10.1109/TPDS.2024.3430853
Dhruv Gajaria;Kevin Antony Gomez;Tosiron Adegbija
In-memory computing promises to overcome the von Neumann bottleneck in computer systems by performing computations directly within the memory. Previous research has suggested using Spin-Transfer Torque RAM (STT-RAM) for in-memory computing due to its non-volatility, low leakage power, high density, endurance, and commercial viability. This paper explores hierarchical in-memory computing, where different levels of the memory hierarchy are augmented with processing elements to optimize workload execution. The paper investigates processing in memory (PiM) using non-volatile STT-RAM and processing in cache (PiC) using volatile STT-RAM with relaxed retention, which helps mitigate STT-RAM's write latency and energy overheads. We analyze tradeoffs and overheads associated with data movement for PiC versus write overheads for PiM using STT-RAMs for various workloads. We examine workload characteristics, such as computational intensity and CPU-dependent workloads with limited instruction-level parallelism, and their impact on PiC/PiM tradeoffs. Using these workloads, we evaluate computing in STT-RAM versus SRAM at different cache hierarchy levels and explore the potential of heterogeneous STT-RAM cache architectures with various retention times for PiC and CPU-based computing. Our experiments reveal significant advantages of STT-RAM-based PiC over PiM for specific workloads. Finally, we describe open research problems in hierarchical in-memory computing architectures to further enhance this paradigm.
内存计算有望通过直接在内存中执行计算来克服计算机系统中的冯-诺依曼瓶颈。以前的研究建议使用自旋转移转矩 RAM(STT-RAM)进行内存计算,因为它具有非挥发性、低泄漏功率、高密度、耐用性和商业可行性。本文探讨了分层内存计算,即在内存分层的不同层次增加处理元件,以优化工作负载的执行。本文研究了使用非易失性 STT-RAM 的内存中处理(PiM)和使用易失性 STT-RAM 的缓存中处理(PiC),缓存中处理采用了宽松的保留方式,这有助于减轻 STT-RAM 的写延迟和能耗开销。我们分析了使用 STT-RAM 的 PiC 数据移动与 PiM 写入开销在各种工作负载中的权衡和开销。我们研究了工作负载的特征,如计算强度和依赖于 CPU 且指令级并行性有限的工作负载,以及它们对 PiC/PiM 权衡的影响。利用这些工作负载,我们评估了 STT-RAM 与 SRAM 在不同高速缓存层次中的计算性能,并探索了具有不同保留时间的异构 STT-RAM 高速缓存架构在 PiC 和基于 CPU 的计算中的潜力。我们的实验表明,对于特定工作负载,基于 STT-RAM 的 PiC 比 PiM 有明显优势。最后,我们介绍了分层内存计算架构的开放研究课题,以进一步加强这一范例。
{"title":"STT-RAM-Based Hierarchical in-Memory Computing","authors":"Dhruv Gajaria;Kevin Antony Gomez;Tosiron Adegbija","doi":"10.1109/TPDS.2024.3430853","DOIUrl":"10.1109/TPDS.2024.3430853","url":null,"abstract":"In-memory computing promises to overcome the von Neumann bottleneck in computer systems by performing computations directly within the memory. Previous research has suggested using \u0000<italic>Spin-Transfer Torque RAM (STT-RAM)</i>\u0000 for in-memory computing due to its non-volatility, low leakage power, high density, endurance, and commercial viability. This paper explores \u0000<italic>hierarchical in-memory computing</i>\u0000, where different levels of the memory hierarchy are augmented with processing elements to optimize workload execution. The paper investigates processing in memory (PiM) using non-volatile STT-RAM and processing in cache (PiC) using volatile STT-RAM with relaxed retention, which helps mitigate STT-RAM's write latency and energy overheads. We analyze tradeoffs and overheads associated with data movement for PiC versus write overheads for PiM using STT-RAMs for various workloads. We examine workload characteristics, such as computational intensity and CPU-dependent workloads with limited instruction-level parallelism, and their impact on PiC/PiM tradeoffs. Using these workloads, we evaluate computing in STT-RAM versus SRAM at different cache hierarchy levels and explore the potential of heterogeneous STT-RAM cache architectures with various retention times for PiC and CPU-based computing. Our experiments reveal significant advantages of STT-RAM-based PiC over PiM for specific workloads. Finally, we describe open research problems in hierarchical in-memory computing architectures to further enhance this paradigm.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 9","pages":"1615-1629"},"PeriodicalIF":5.6,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reproducibility of the DaCe Framework on NPBench Benchmarks DaCe 框架在 NPBench 基准上的再现性
IF 5.3 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-12 DOI: 10.1109/tpds.2024.3427130
Anish Govind, Yuchen Jing, Stefanie Dao, Michael Granado, Rachel Handran, Davit Margarian, Matthew Mikhailov, Danny Vo, Matei-Alexandru Gardus, Khai Vu, Derek Bouius, Bryan Chin, Mahidhar Tatineni, Mary Thomas
{"title":"Reproducibility of the DaCe Framework on NPBench Benchmarks","authors":"Anish Govind, Yuchen Jing, Stefanie Dao, Michael Granado, Rachel Handran, Davit Margarian, Matthew Mikhailov, Danny Vo, Matei-Alexandru Gardus, Khai Vu, Derek Bouius, Bryan Chin, Mahidhar Tatineni, Mary Thomas","doi":"10.1109/tpds.2024.3427130","DOIUrl":"https://doi.org/10.1109/tpds.2024.3427130","url":null,"abstract":"","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"14 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141614552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-Effective Server Deployment for Multi-Access Edge Networks: A Cooperative Scheme 为多接入边缘网络部署经济高效的服务器:合作方案
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-11 DOI: 10.1109/TPDS.2024.3426523
Rong Cong;Zhiwei Zhao;Linyuanqi Zhang;Geyong Min
The combination of 5G/6G and edge computing has been envisioned as a promising paradigm to empower pervasive and intensive computing for the Internet-of-Things (IoT). High deployment cost is one of the major obstacles for realizing 5G/6G edge computing. Most existing works tried to deploy the minimum number of edge servers to cover a target area by avoiding coverage overlaps. However, following this framework, the resource requirement per server will be drastically increased by the peak requirement during workload variations. Even worse, most resources will be left under-utilized for most of the time. To address this problem, we propose CoopEdge, a cost-effective server deployment scheme for cooperative multi-access edge computing. The key idea of CoopEdge is to allow deploying overlapped servers to handle variable requested workloads in a cooperative manner. In this way, the peak demands can be dispersed into multiple servers, and the resource requirement for each server can be greatly reduced. We propose a Two-step Incremental Deployment (TID) algorithm to jointly decide the server deployment and cooperation policies. For the scenarios involving multiple network operators that are unwilling to cooperate with each other, we further extend the TID algorithm to a distributed TID algorithm based on the game theory. Extensive evaluation experiments are conducted based on the measurement results of seven real-world edge applications. The results show that compared with the state-of-the-art work, CoopEdge significantly reduces the deployment cost by 38.7% and improves resource utilization by 36.2%, and the proposed distributed algorithm can achieve a comparable deployment cost with CoopEdge, especially for small-coverage servers.
5G/6G 与边缘计算的结合被视为一种前景广阔的模式,可为物联网(IoT)提供无处不在的密集计算。高昂的部署成本是实现 5G/6G 边缘计算的主要障碍之一。大多数现有研究都试图通过避免覆盖重叠,部署最少数量的边缘服务器来覆盖目标区域。然而,按照这种框架,每台服务器的资源需求将因工作负载变化时的峰值需求而急剧增加。更糟糕的是,大多数资源在大部分时间都得不到充分利用。为了解决这个问题,我们提出了 CoopEdge,一种用于合作式多访问边缘计算的经济高效的服务器部署方案。CoopEdge 的主要理念是允许部署重叠的服务器,以合作的方式处理不同请求的工作负载。通过这种方式,峰值需求可以被分散到多个服务器上,每个服务器的资源需求也可以大大降低。我们提出了一种两步增量部署(TID)算法来共同决定服务器部署和合作策略。针对多个网络运营商不愿意相互合作的情况,我们将 TID 算法进一步扩展为基于博弈论的分布式 TID 算法。基于七个真实世界边缘应用的测量结果,我们进行了广泛的评估实验。结果表明,与最先进的工作相比,CoopEdge 大幅降低了 38.7% 的部署成本,提高了 36.2% 的资源利用率,而所提出的分布式算法可以实现与 CoopEdge 相当的部署成本,尤其是对于小覆盖范围的服务器。
{"title":"Cost-Effective Server Deployment for Multi-Access Edge Networks: A Cooperative Scheme","authors":"Rong Cong;Zhiwei Zhao;Linyuanqi Zhang;Geyong Min","doi":"10.1109/TPDS.2024.3426523","DOIUrl":"10.1109/TPDS.2024.3426523","url":null,"abstract":"The combination of 5G/6G and edge computing has been envisioned as a promising paradigm to empower pervasive and intensive computing for the Internet-of-Things (IoT). High deployment cost is one of the major obstacles for realizing 5G/6G edge computing. Most existing works tried to deploy the minimum number of edge servers to cover a target area by avoiding coverage overlaps. However, following this framework, the resource requirement per server will be drastically increased by the peak requirement during workload variations. Even worse, most resources will be left under-utilized for most of the time. To address this problem, we propose CoopEdge, a cost-effective server deployment scheme for cooperative multi-access edge computing. The key idea of CoopEdge is to allow deploying overlapped servers to handle variable requested workloads in a cooperative manner. In this way, the peak demands can be dispersed into multiple servers, and the resource requirement for each server can be greatly reduced. We propose a Two-step Incremental Deployment (TID) algorithm to jointly decide the server deployment and cooperation policies. For the scenarios involving multiple network operators that are unwilling to cooperate with each other, we further extend the TID algorithm to a distributed TID algorithm based on the game theory. Extensive evaluation experiments are conducted based on the measurement results of seven real-world edge applications. The results show that compared with the state-of-the-art work, CoopEdge significantly reduces the deployment cost by 38.7% and improves resource utilization by 36.2%, and the proposed distributed algorithm can achieve a comparable deployment cost with CoopEdge, especially for small-coverage servers.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 9","pages":"1583-1597"},"PeriodicalIF":5.6,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive QoS-Aware Microservice Deployment With Excessive Loads via Intra- and Inter-Datacenter Scheduling 通过数据中心内和数据中心间调度实现超负荷情况下的自适应 QoS 感知微服务部署
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-10 DOI: 10.1109/TPDS.2024.3425931
Jiuchen Shi;Kaihua Fu;Jiawen Wang;Quan Chen;Deze Zeng;Minyi Guo
User-facing applications often experience excessive loads and are shifting towards the microservice architecture. To fully utilize heterogeneous resources, current datacenters have adopted the disaggregated storage and compute architecture, where the storage and compute clusters are suitable to deploy the stateful and stateless microservices, respectively. Moreover, when the local datacenter has insufficient resources to host excessive loads, a reasonable solution is moving some microservices to remote datacenters. However, it is nontrivial to decide the appropriate microservice deployment inside the local datacenter and identify the appropriate migration decision to remote datacenters, as microservices show different characteristics, and the local datacenter shows different resource contention situations. We therefore propose ELIS, an intra- and inter-datacenter scheduling system that ensures the Quality-of-Service (QoS) of the microservice application, while minimizing the network bandwidth usage and computational resource usage. ELIS comprises a resource manager, a cross-cluster microservice deployer, and a reward-based microservice migrator. The resource manager allocates near-optimal resources for microservices while ensuring QoS. The microservice deployer deploys the microservices between the storage and compute clusters in the local datacenter, to minimize the network bandwidth usage while satisfying the microservice resource demand. The microservice migrator migrates some microservices to remote datacenters when local resources cannot afford the excessive loads. Experimental results show that ELIS ensures the QoS of user-facing applications. Meanwhile, it reduces the public network bandwidth usage, the remote computational resource usage, and the local network bandwidth usage by 49.6%, 48.5%, and 60.7% on average, respectively.
面向用户的应用程序经常会遇到负载过重的问题,因此正在转向微服务架构。为了充分利用异构资源,目前的数据中心采用了分解存储和计算的架构,存储集群和计算集群分别适合部署有状态和无状态的微服务。此外,当本地数据中心的资源不足以承载过多负载时,合理的解决方案是将部分微服务转移到远程数据中心。然而,由于微服务显示出不同的特性,本地数据中心也显示出不同的资源争用情况,要决定在本地数据中心内部署适当的微服务并确定向远程数据中心迁移的适当决策并非易事。因此,我们提出了一个数据中心内和数据中心间的调度系统--ELIS,它能确保微服务应用的服务质量(QoS),同时最大限度地减少网络带宽使用量和计算资源使用量。ELIS 由资源管理器、跨集群微服务部署器和基于奖励的微服务迁移器组成。资源管理器为微服务分配接近最优的资源,同时确保服务质量。微服务部署器在本地数据中心的存储集群和计算集群之间部署微服务,在满足微服务资源需求的同时尽量减少网络带宽的使用。当本地资源无法承受过多负载时,微服务迁移器会将一些微服务迁移到远程数据中心。实验结果表明,ELIS 确保了面向用户的应用程序的服务质量。同时,它将公共网络带宽使用率、远程计算资源使用率和本地网络带宽使用率分别平均降低了 49.6%、48.5% 和 60.7%。
{"title":"Adaptive QoS-Aware Microservice Deployment With Excessive Loads via Intra- and Inter-Datacenter Scheduling","authors":"Jiuchen Shi;Kaihua Fu;Jiawen Wang;Quan Chen;Deze Zeng;Minyi Guo","doi":"10.1109/TPDS.2024.3425931","DOIUrl":"10.1109/TPDS.2024.3425931","url":null,"abstract":"User-facing applications often experience excessive loads and are shifting towards the microservice architecture. To fully utilize heterogeneous resources, current datacenters have adopted the disaggregated storage and compute architecture, where the storage and compute clusters are suitable to deploy the stateful and stateless microservices, respectively. Moreover, when the local datacenter has insufficient resources to host excessive loads, a reasonable solution is moving some microservices to remote datacenters. However, it is nontrivial to decide the appropriate microservice deployment inside the local datacenter and identify the appropriate migration decision to remote datacenters, as microservices show different characteristics, and the local datacenter shows different resource contention situations. We therefore propose ELIS, an intra- and inter-datacenter scheduling system that ensures the Quality-of-Service (QoS) of the microservice application, while minimizing the network bandwidth usage and computational resource usage. ELIS comprises a \u0000<italic>resource manager</i>\u0000, a \u0000<italic>cross-cluster microservice deployer</i>\u0000, and a \u0000<italic>reward-based microservice migrator</i>\u0000. The resource manager allocates near-optimal resources for microservices while ensuring QoS. The microservice deployer deploys the microservices between the storage and compute clusters in the local datacenter, to minimize the network bandwidth usage while satisfying the microservice resource demand. The microservice migrator migrates some microservices to remote datacenters when local resources cannot afford the excessive loads. Experimental results show that ELIS ensures the QoS of user-facing applications. Meanwhile, it reduces the public network bandwidth usage, the remote computational resource usage, and the local network bandwidth usage by 49.6%, 48.5%, and 60.7% on average, respectively.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 9","pages":"1565-1582"},"PeriodicalIF":5.6,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141585924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1