首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core Acceleration BCB-SpTC:采用张量核加速的高效稀疏高维张量收缩法
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-10 DOI: 10.1109/TPDS.2024.3477746
Rong Hu;Haotian Wang;Wangdong Yang;Renqiu Ouyang;Keqin Li;Kenli Li
Sparse tensor contraction (SpTC) is an important operator in tensor networks, which tends to generate a large amount of sparse high-dimensional data, placing higher demands on the computational performance and storage bandwidth of the processor. Using GPUs with powerful arithmetic characteristics is a reliable choice for accelerating SpTC, however, the high dimensionality and sparsity of tensor makes GPU-accelerated SpTC operators suffer from the difficulties of low computational intensity and high memory consumption. The recent introduction of Tensor Core Units (TCUs) on GPUs brings even more powerful arithmetic, which exacerbates the memory wall problem. To cope with the challenges, this paper proposes a new BCB format that linearizes the indices of multidimensional blocks to reduce block index accesses and uses a bitmap to store the distribution of non-zero elements in a block to reduce the storage overhead. A parallel blocking algorithm of BCB-SpTC is designed to divide the binary linear indices into free and contracted indexes to improve the pairing overhead of computational tasks. Then based on the characteristic computation method of TCUs, the proprietary filling method of TCUs is designed to overcome the inefficiency of parallel computation of sparse data on TCUs. Finally, experimental results on the A100 dataset show that BCB-SpTC improves the acceleration ratio by $1.1times$ to $21.3times$ over the existing SpTC GPU method.
稀疏张量收缩(Sparse tensor contraction,SpTC)是张量网络中的一个重要算子,它往往会产生大量稀疏的高维数据,对处理器的计算性能和存储带宽提出了更高的要求。使用具有强大运算特性的 GPU 是加速 SpTC 的可靠选择,然而,张量的高维性和稀疏性使得 GPU 加速的 SpTC 算子存在计算强度低和内存消耗大的困难。最近在 GPU 上引入的张量核心单元(TCU)带来了更强大的运算能力,这加剧了内存墙问题。为了应对这些挑战,本文提出了一种新的 BCB 格式,它将多维块的索引线性化以减少块索引访问,并使用位图来存储块中非零元素的分布以减少存储开销。设计了 BCB-SpTC 的并行分块算法,将二进制线性索引分为自由索引和收缩索引,以改善计算任务的配对开销。然后,基于 TCU 的特征计算方法,设计了 TCU 专有的填充方法,以克服 TCU 上稀疏数据并行计算的低效率问题。最后,在A100数据集上的实验结果表明,BCB-SpTC比现有的SpTC GPU方法提高了1.1倍到21.3倍。
{"title":"BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core Acceleration","authors":"Rong Hu;Haotian Wang;Wangdong Yang;Renqiu Ouyang;Keqin Li;Kenli Li","doi":"10.1109/TPDS.2024.3477746","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3477746","url":null,"abstract":"Sparse tensor contraction (SpTC) is an important operator in tensor networks, which tends to generate a large amount of sparse high-dimensional data, placing higher demands on the computational performance and storage bandwidth of the processor. Using GPUs with powerful arithmetic characteristics is a reliable choice for accelerating SpTC, however, the high dimensionality and sparsity of tensor makes GPU-accelerated SpTC operators suffer from the difficulties of low computational intensity and high memory consumption. The recent introduction of Tensor Core Units (TCUs) on GPUs brings even more powerful arithmetic, which exacerbates the memory wall problem. To cope with the challenges, this paper proposes a new BCB format that linearizes the indices of multidimensional blocks to reduce block index accesses and uses a bitmap to store the distribution of non-zero elements in a block to reduce the storage overhead. A parallel blocking algorithm of BCB-SpTC is designed to divide the binary linear indices into free and contracted indexes to improve the pairing overhead of computational tasks. Then based on the characteristic computation method of TCUs, the proprietary filling method of TCUs is designed to overcome the inefficiency of parallel computation of sparse data on TCUs. Finally, experimental results on the A100 dataset show that BCB-SpTC improves the acceleration ratio by \u0000<inline-formula><tex-math>$1.1times$</tex-math></inline-formula>\u0000 to \u0000<inline-formula><tex-math>$21.3times$</tex-math></inline-formula>\u0000 over the existing SpTC GPU method.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2435-2448"},"PeriodicalIF":5.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142517899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoltDB: Accelerating Blockchain via Ancient State Segregation MoltDB:通过古代状态隔离加速区块链发展
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-10 DOI: 10.1109/TPDS.2024.3467927
Junyuan Liang;Wuhui Chen;Zicong Hong;Haogang Zhu;Wangjie Qiu;Zibin Zheng
Blockchain store states in Log-Structured Merge (LSM) tree-based database. Due to blockchain traceability, the growing ancient states are inevitably stored in the databases. Unfortunately, by default, this process mixes current and ancient states in the data layout, increasing unnecessary disk I/O access and slowing transaction execution. This paper proposes MoltDB, a scalable LSM-based database for efficient transaction execution through a novel idea of ancient state segregation, i.e., to segregate current and ancient states in the data layout. However, the frequently generated and uncertainly accessed characteristics of ancient states make the segregation challenging. Thus, we develop an “extract-compact” mechanism to batch extraction process for frequently generated ancient states and the LSM compaction process to relieve additional disk I/O overhead. Moreover, we design an adaptive LSM-based storage for the uncertainly accessed ancient states extracted for on-demand access. We implement MoltDB as a database engine compatible with many mainstream blockchains and integrate it into Ethereum for evaluation. Experimental results show that MoltDB achieves 1.3 × transaction throughput and 30% disk I/O latency savings over the state-of-the-art works.
区块链将状态存储在基于日志结构合并(LSM)的树型数据库中。由于区块链的可追溯性,不断增长的古代状态不可避免地会存储在数据库中。不幸的是,默认情况下,这一过程会在数据布局中混合当前状态和古代状态,增加不必要的磁盘 I/O 访问,并减慢事务执行速度。本文提出了一种基于 LSM 的可扩展数据库 MoltDB,通过新颖的古态隔离思想(即在数据布局中隔离当前状态和古态)实现高效的事务执行。然而,由于古状态具有生成频繁、访问不确定的特点,因此隔离工作具有挑战性。因此,我们开发了一种 "提取-压缩 "机制,对频繁生成的古状态和 LSM 压缩过程进行批量提取,以减轻额外的磁盘 I/O 开销。此外,我们还设计了一种基于 LSM 的自适应存储,用于按需访问提取的不确定访问古状态。我们将 MoltDB 实现为与许多主流区块链兼容的数据库引擎,并将其集成到以太坊中进行评估。实验结果表明,MoltDB 实现了 1.3 倍的交易吞吐量,磁盘 I/O 延迟比最先进的作品节省了 30%。
{"title":"MoltDB: Accelerating Blockchain via Ancient State Segregation","authors":"Junyuan Liang;Wuhui Chen;Zicong Hong;Haogang Zhu;Wangjie Qiu;Zibin Zheng","doi":"10.1109/TPDS.2024.3467927","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3467927","url":null,"abstract":"Blockchain store states in Log-Structured Merge (LSM) tree-based database. Due to blockchain traceability, the growing ancient states are inevitably stored in the databases. Unfortunately, by default, this process mixes \u0000<italic>current</i>\u0000 and \u0000<italic>ancient</i>\u0000 states in the data layout, increasing unnecessary disk I/O access and slowing transaction execution. This paper proposes MoltDB, a scalable LSM-based database for efficient transaction execution through a novel idea of \u0000<italic>ancient state segregation</i>\u0000, i.e., to segregate current and ancient states in the data layout. However, the frequently generated and uncertainly accessed characteristics of ancient states make the segregation challenging. Thus, we develop an “extract-compact” mechanism to batch extraction process for frequently generated ancient states and the LSM compaction process to relieve additional disk I/O overhead. Moreover, we design an adaptive LSM-based storage for the uncertainly accessed ancient states extracted for on-demand access. We implement MoltDB as a database engine compatible with many mainstream blockchains and integrate it into Ethereum for evaluation. Experimental results show that MoltDB achieves 1.3 × transaction throughput and 30% disk I/O latency savings over the state-of-the-art works.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2545-2558"},"PeriodicalIF":5.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUs FastLoad:加快 GPU 上 SpMV 稀疏矩阵和矢量的数据加载速度
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-09 DOI: 10.1109/TPDS.2024.3477431
Jinyu Hu;Huizhang Luo;Hong Jiang;Guoqing Xiao;Kenli Li
Sparse Matrix-Vector Multiplication (SpMV) on GPUs has gained significant attention because of SpMV's importance in modern applications and the increasing computing power of GPUs in the last decade. Previous studies have emphasized the importance of data loading for the overall performance of SpMV and demonstrated the efficacy of coalesced memory access in enhancing data loading efficiency. However, existing approaches fall far short of reaching the full potential of data loading on modern GPUs. In this paper, we propose an efficient algorithm called FastLoad, that speeds up the loading of both sparse matrices and input vectors of SpMV on modern GPUs. Leveraging coalesced memory access, FastLoad achieves high loading efficiency and load balance by sorting both the columns of the sparse matrix and elements of the input vector based on the number of non-zero elements while organizing non-zero elements in blocks to avoid thread divergence. FastLoad takes the Compressed Sparse Column (CSC) format as an implementation case to prove the concept and gain insights. We conduct a comprehensive comparison of FastLoad with the CSC-based SpMV, cuSPARSE, CSR5, and TileSpMV, using the full SuiteSparse Matrix Collection as workload. The experimental results on RTX 3090 Ti demonstrate that our method outperforms the others in most matrices, with geometric speedup means over CSC-based, cuSPARSE, CSR5, and TileSpMV being 2.12×, 2.98×, 2.88×, and 1.22×, respectively.
由于稀疏矩阵-矢量乘法(SpMV)在现代应用中的重要性以及近十年来 GPU 计算能力的不断提升,它在 GPU 上的应用受到了广泛关注。以往的研究强调了数据加载对 SpMV 整体性能的重要性,并证明了凝聚内存访问在提高数据加载效率方面的功效。然而,现有的方法远不能充分发挥数据加载在现代 GPU 上的潜力。在本文中,我们提出了一种名为 "FastLoad "的高效算法,它能加快 SpMV 在现代 GPU 上的稀疏矩阵和输入向量的加载速度。利用凝聚内存访问,FastLoad 根据非零元素的数量对稀疏矩阵的列和输入向量的元素进行排序,同时将非零元素组织成块以避免线程发散,从而实现了高加载效率和负载平衡。FastLoad 以压缩稀疏列(CSC)格式为实施案例,证明了这一概念并获得了深刻的见解。我们使用完整的 SuiteSparse Matrix Collection 作为工作负载,对 FastLoad 与基于 CSC 的 SpMV、cuSPARSE、CSR5 和 TileSpMV 进行了全面比较。在 RTX 3090 Ti 上的实验结果表明,我们的方法在大多数矩阵中都优于其他方法,与基于 CSC 的 SpMV、cuSPARSE、CSR5 和 TileSpMV 相比,几何速度分别提高了 2.12×、2.98×、2.88× 和 1.22×。
{"title":"FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUs","authors":"Jinyu Hu;Huizhang Luo;Hong Jiang;Guoqing Xiao;Kenli Li","doi":"10.1109/TPDS.2024.3477431","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3477431","url":null,"abstract":"Sparse Matrix-Vector Multiplication (SpMV) on GPUs has gained significant attention because of SpMV's importance in modern applications and the increasing computing power of GPUs in the last decade. Previous studies have emphasized the importance of data loading for the overall performance of SpMV and demonstrated the efficacy of coalesced memory access in enhancing data loading efficiency. However, existing approaches fall far short of reaching the full potential of data loading on modern GPUs. In this paper, we propose an efficient algorithm called FastLoad, that speeds up the loading of both sparse matrices and input vectors of SpMV on modern GPUs. Leveraging coalesced memory access, FastLoad achieves high loading efficiency and load balance by sorting both the columns of the sparse matrix and elements of the input vector based on the number of non-zero elements while organizing non-zero elements in blocks to avoid thread divergence. FastLoad takes the Compressed Sparse Column (CSC) format as an implementation case to prove the concept and gain insights. We conduct a comprehensive comparison of FastLoad with the CSC-based SpMV, cuSPARSE, CSR5, and TileSpMV, using the full SuiteSparse Matrix Collection as workload. The experimental results on RTX 3090 Ti demonstrate that our method outperforms the others in most matrices, with geometric speedup means over CSC-based, cuSPARSE, CSR5, and TileSpMV being 2.12×, 2.98×, 2.88×, and 1.22×, respectively.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2423-2434"},"PeriodicalIF":5.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142518165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery Network 多层内容分发网络中瞬时数据在线弹性缓存的竞争力分析
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-07 DOI: 10.1109/TPDS.2024.3475412
Binghan Wu;Wei Bao;Bing Bing Zhou
As the demand for faster and more reliable content delivery escalates, Content Delivery Networks (CDNs) face significant challenges in managing content placement across their increasingly complex, multi-tiered structures to balance performance, complexity, and scalability, while addressing the transient nature of data and the unpredictability of internet traffic. Addressing these challenges, this study introduces a novel multi-tier CDN caching strategy that navigates spatial and temporal trade-offs in cache placement, considering the cache placement cost diminishes with the content lifetime, and the uncertainty of future data demands. We design a distributed online algorithm that evaluates each incoming request and places new caches when the total content delivery cost exceeds a threshold. Our competitive analysis shows a tight and optimal $mathtt {Tiers}+1$ competitive ratio. Additionally, our algorithm has low complexity by passing $O(mathtt {Tiers})$ number of reference messages for each request, which enhances its practical applicability. Empirical validation through numerical simulations and trace-driven experiments confirms the superiority of our approach to existing benchmarks in real-world CDN settings.
随着对更快、更可靠的内容交付的需求不断升级,内容交付网络(CDN)在管理其日益复杂的多层结构中的内容放置,以平衡性能、复杂性和可扩展性,同时应对数据的瞬时性和互联网流量的不可预测性方面面临巨大挑战。为了应对这些挑战,本研究引入了一种新颖的多层 CDN 缓存策略,该策略可在缓存位置的空间和时间权衡中进行导航,同时考虑到缓存位置的成本会随着内容生命周期的延长而降低,以及未来数据需求的不确定性。我们设计了一种分布式在线算法,该算法会评估每个传入请求,并在总内容交付成本超过阈值时放置新的缓存。我们的竞争分析表明,我们的竞争比率是最优的。此外,我们的算法通过为每个请求传递 $O(mathtt {Tiers})$ 的参考信息来降低复杂度,从而提高了其实际应用性。通过数值模拟和轨迹驱动实验进行的经验验证证实了我们的方法在真实 CDN 环境中优于现有基准。
{"title":"Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery Network","authors":"Binghan Wu;Wei Bao;Bing Bing Zhou","doi":"10.1109/TPDS.2024.3475412","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3475412","url":null,"abstract":"As the demand for faster and more reliable content delivery escalates, Content Delivery Networks (CDNs) face significant challenges in managing content placement across their increasingly complex, multi-tiered structures to balance performance, complexity, and scalability, while addressing the transient nature of data and the unpredictability of internet traffic. Addressing these challenges, this study introduces a novel multi-tier CDN caching strategy that navigates spatial and temporal trade-offs in cache placement, considering the cache placement cost diminishes with the content lifetime, and the uncertainty of future data demands. We design a distributed online algorithm that evaluates each incoming request and places new caches when the total content delivery cost exceeds a threshold. Our competitive analysis shows a tight and optimal \u0000<inline-formula><tex-math>$mathtt {Tiers}+1$</tex-math></inline-formula>\u0000 competitive ratio. Additionally, our algorithm has low complexity by passing \u0000<inline-formula><tex-math>$O(mathtt {Tiers})$</tex-math></inline-formula>\u0000 number of reference messages for each request, which enhances its practical applicability. Empirical validation through numerical simulations and trace-driven experiments confirms the superiority of our approach to existing benchmarks in real-world CDN settings.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2449-2462"},"PeriodicalIF":5.6,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142518163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Performance Modeling and Prediction for Distributed DNN Training 分布式 DNN 训练的性能建模和预测调查
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-07 DOI: 10.1109/TPDS.2024.3476390
Zhenhua Guo;Yinan Tang;Jidong Zhai;Tongtong Yuan;Jian Jin;Li Wang;Yaqian Zhao;Rengang Li
The recent breakthroughs in large-scale DNN attract significant attention from both academia and industry toward distributed DNN training techniques. Due to the time-consuming and expensive execution process of large-scale distributed DNN training, it is crucial to model and predict the performance of distributed DNN training before its actual deployment, in order to optimize the design of distributed DNN training at low cost. This paper analyzes and emphasizes the importance of modeling and predicting the performance of distributed DNN training, categorizes and analyses the related state-of-the-art works, and discusses future challenges and opportunities for this research field. The objectives of this paper are twofold: first, to assist researchers in understanding and choosing suitable modeling and prediction tools for large-scale distributed DNN training, and second, to encourage researchers to propose more valuable research about performance modeling and prediction for distributed DNN training in the future.
近年来,大规模 DNN 取得了突破性进展,吸引了学术界和工业界对分布式 DNN 训练技术的极大关注。由于大规模分布式 DNN 训练的执行过程耗时长、成本高,因此在实际部署前对分布式 DNN 训练的性能进行建模和预测,对于低成本优化分布式 DNN 训练的设计至关重要。本文分析并强调了分布式 DNN 训练建模和性能预测的重要性,对相关的最新研究成果进行了归类和分析,并探讨了该研究领域未来的挑战和机遇。本文的目的有二:一是帮助研究人员了解和选择适合大规模分布式 DNN 训练的建模和预测工具;二是鼓励研究人员在未来提出更多有价值的分布式 DNN 训练性能建模和预测研究。
{"title":"A Survey on Performance Modeling and Prediction for Distributed DNN Training","authors":"Zhenhua Guo;Yinan Tang;Jidong Zhai;Tongtong Yuan;Jian Jin;Li Wang;Yaqian Zhao;Rengang Li","doi":"10.1109/TPDS.2024.3476390","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3476390","url":null,"abstract":"The recent breakthroughs in large-scale DNN attract significant attention from both academia and industry toward distributed DNN training techniques. Due to the time-consuming and expensive execution process of large-scale distributed DNN training, it is crucial to model and predict the performance of distributed DNN training before its actual deployment, in order to optimize the design of distributed DNN training at low cost. This paper analyzes and emphasizes the importance of modeling and predicting the performance of distributed DNN training, categorizes and analyses the related state-of-the-art works, and discusses future challenges and opportunities for this research field. The objectives of this paper are twofold: first, to assist researchers in understanding and choosing suitable modeling and prediction tools for large-scale distributed DNN training, and second, to encourage researchers to propose more valuable research about performance modeling and prediction for distributed DNN training in the future.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2463-2478"},"PeriodicalIF":5.6,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10707191","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142518164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrieKV: A High-Performance Key-Value Store Design With Memory as Its First-Class Citizen TrieKV:以内存为头等公民的高性能键值存储设计
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-02 DOI: 10.1109/TPDS.2024.3473013
Hui Sun;Deyan Kong;Song Jiang;Yinliang Yue;Xiao Qin
Key-value (KV) stores based on log-structured merge tree (LSM-tree) have been extensively studied and deployed in major information technology infrastructures. Because this type of systems is catered for KV store accessing disks, a limited disk bandwidth increases the difficulty of serving online data requests. One solution involves using a large DRAM such that frequent KV pairs are buffered and accessed from the main memory – and this solution exposes a major design drawback of the KV store: its lack of support for integrated data management in memory and on disks. For example, data in the most popular LSM-tree implementation – RocksDB – may reside in a small write buffer (MemTable) that organizes KV pairs for disk writes, a buffer cache for disk blocks, a write-ahead log on the disk for data persistence, and in various LSM levels on the disk. Without the integrated management of indexes, data, and their persistence in a hierarchical memory/disk architecture, memory is under-utilized along with missed performance optimization opportunities. We propose a KV store, TrieKV, which holistically incorporates DRAM, persistent memory (PMem), and disk with certain desired features: (1) fast in-memory access, (2) accurate identification of hot/cold data at an adaptable granularity, (3) customized memory space allocation for minimized fragmentation, (4) hotness-aware data placement across the storage hierarchy, (5) in-place data persistence in the PMem, and (6) hotness-aware LSM-tree compaction. TrieKV employs a single, integrated trie-structured index for all KV pairs in memory, where access hotness can be consistently discovered. Accordingly, the KV placement is dynamically determined according to the hotness and persistence needs of the storage hierarchy spanning the DRAM, PMem, and solid-state drive. In the experiment, we demonstrate that the 99th latency of RocksDB and NoveLSM is 38x and 6x higher than that of TrieKV, respectively. In addition, TrieKV outperforms RocksDB and NoveLSM by a factor of 5.6 and 1.7in terms of throughput, respectively.
基于日志结构合并树(LSM-tree)的键值(KV)存储已被广泛研究并部署在主要的信息技术基础设施中。由于这类系统是针对访问磁盘的 KV 存储,有限的磁盘带宽增加了服务在线数据请求的难度。一种解决方案是使用大型 DRAM,对频繁出现的 KV 对进行缓冲,然后从主存储器进行访问--这种解决方案暴露了 KV 存储的一个主要设计缺陷:不支持存储器和磁盘上的集成数据管理。例如,在最流行的 LSM 树实现(RocksDB)中,数据可能存在于一个小的写缓冲区(MemTable)中,该缓冲区用于组织磁盘写入的 KV 对、磁盘块的缓冲缓存、磁盘上用于数据持久化的先写日志以及磁盘上的各种 LSM 层中。如果不对分层内存/磁盘架构中的索引、数据及其持久性进行集成管理,内存就会利用率低下,并错失性能优化机会。我们提出了一种 KV 存储,即 TrieKV,它全面整合了 DRAM、持久内存(PMem)和磁盘,并具有某些所需的功能:(1) 快速内存访问,(2) 以适应性粒度准确识别热/冷数据,(3) 自定义内存空间分配以最小化碎片,(4) 在整个存储层次结构中进行热感知数据放置,(5) 在 PMem 中进行就地数据持久化,(6) 热感知 LSM 树压缩。TrieKV 对内存中的所有 KV 对采用单一的集成三元结构索引,可以持续发现访问热度。因此,KV 位置是根据跨越 DRAM、PMem 和固态硬盘的存储层次的热度和持久性需求动态决定的。实验证明,RocksDB 和 NoveLSM 的第 99 次延迟分别是 TrieKV 的 38 倍和 6 倍。此外,就吞吐量而言,TrieKV分别比RocksDB和NoveLSM高出5.6倍和1.7倍。
{"title":"TrieKV: A High-Performance Key-Value Store Design With Memory as Its First-Class Citizen","authors":"Hui Sun;Deyan Kong;Song Jiang;Yinliang Yue;Xiao Qin","doi":"10.1109/TPDS.2024.3473013","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3473013","url":null,"abstract":"Key-value (KV) stores based on log-structured merge tree (LSM-tree) have been extensively studied and deployed in major information technology infrastructures. Because this type of systems is catered for KV store accessing disks, a limited disk bandwidth increases the difficulty of serving online data requests. One solution involves using a large DRAM such that frequent KV pairs are buffered and accessed from the main memory – and this solution exposes a major design drawback of the KV store: its lack of support for integrated data management in memory and on disks. For example, data in the most popular LSM-tree implementation – RocksDB – may reside in a small write buffer (MemTable) that organizes KV pairs for disk writes, a buffer cache for disk blocks, a write-ahead log on the disk for data persistence, and in various LSM levels on the disk. Without the integrated management of indexes, data, and their persistence in a hierarchical memory/disk architecture, memory is under-utilized along with missed performance optimization opportunities. We propose a KV store, TrieKV, which holistically incorporates DRAM, persistent memory (PMem), and disk with certain desired features: (1) fast in-memory access, (2) accurate identification of hot/cold data at an adaptable granularity, (3) customized memory space allocation for minimized fragmentation, (4) hotness-aware data placement across the storage hierarchy, (5) in-place data persistence in the PMem, and (6) hotness-aware LSM-tree compaction. TrieKV employs a single, integrated trie-structured index for all KV pairs in memory, where access hotness can be consistently discovered. Accordingly, the KV placement is dynamically determined according to the hotness and persistence needs of the storage hierarchy spanning the DRAM, PMem, and solid-state drive. In the experiment, we demonstrate that the 99th latency of RocksDB and NoveLSM is 38x and 6x higher than that of TrieKV, respectively. In addition, TrieKV outperforms RocksDB and NoveLSM by a factor of 5.6 and 1.7in terms of throughput, respectively.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2479-2496"},"PeriodicalIF":5.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142518180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TARIS: Scalable Incremental Processing of Time-Respecting Algorithms on Streaming Graphs TARIS:流图上时间尊重算法的可扩展增量处理
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-30 DOI: 10.1109/TPDS.2024.3471574
Ruchi Bhoot;Suved Sanjay Ghanmode;Yogesh Simmhan
Temporal graphs change with time and have a lifespan associated with each vertex and edge. These graphs are suitable to process time-respecting algorithms where the traversed edges must have monotonic timestamps. Interval-centric Computing Model (ICM) is a distributed programming abstraction to design such temporal algorithms. There has been little work on supporting time-respecting algorithms at large scales for streaming graphs, which are updated continuously at high rates (Millions/s), such as in financial and social networks. In this article, we extend the windowed-variant of ICM for incremental computing over streaming graph updates. We formalize the properties of temporal graph algorithms and prove that our model of incremental computing over streaming updates is equivalent to batch execution of ICM. We design TARIS, a novel distributed graph platform that implements these incremental computing features. We use efficient data structures to reduce memory access and enhance locality during graph updates. We also propose scheduling strategies to interleave updates with computing, and streaming strategies to adapt the execution window for incremental computing to the variable input rates. Our detailed and rigorous evaluation of temporal algorithms on large-scale graphs with up to $2,text{B}$ edges show that TARIS out-performs contemporary baselines, Tink and Gradoop, by 3–4 orders of magnitude, and handles a high input rate of $ 83k$$ 587,text{M}$ Mutations/s with latencies in the order of seconds–minutes.
时态图随时间变化,每个顶点和边都有相关的生命周期。这些图适用于处理尊重时间的算法,其中遍历的边必须具有单调的时间戳。以时间间隔为中心的计算模型(ICM)是设计此类时间算法的分布式编程抽象。对于以高更新率(百万/秒)持续更新的流图(如金融和社交网络),支持大规模时间尊重算法的工作还很少。在本文中,我们扩展了 ICM 的窗口化变体,用于流图更新的增量计算。我们形式化了时序图算法的特性,并证明我们的流更新增量计算模型等同于 ICM 的批处理执行。我们设计了一个新型分布式图平台 TARIS,它实现了这些增量计算功能。我们使用高效的数据结构来减少内存访问,并增强图更新过程中的定位性。我们还提出了将更新与计算交错进行的调度策略,以及使增量计算的执行窗口适应可变输入率的流策略。我们对具有多达 2 美元边的大规模图上的时序算法进行了详细而严格的评估,结果表明 TARIS 的性能比当代基线 Tink 和 Gradoop 高出 3-4 个数量级,并且可以处理 83k$-$587text{M}$ 突变/s 的高输入率,延迟时间在秒-分钟数量级。
{"title":"TARIS: Scalable Incremental Processing of Time-Respecting Algorithms on Streaming Graphs","authors":"Ruchi Bhoot;Suved Sanjay Ghanmode;Yogesh Simmhan","doi":"10.1109/TPDS.2024.3471574","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3471574","url":null,"abstract":"Temporal graphs change with time and have a lifespan associated with each vertex and edge. These graphs are suitable to process time-respecting algorithms where the traversed edges must have monotonic timestamps. Interval-centric Computing Model (ICM) is a distributed programming abstraction to design such temporal algorithms. There has been little work on supporting time-respecting algorithms at large scales for streaming graphs, which are updated continuously at high rates (Millions/s), such as in financial and social networks. In this article, we extend the windowed-variant of ICM for incremental computing over streaming graph updates. We formalize the properties of temporal graph algorithms and prove that our model of incremental computing over streaming updates is equivalent to batch execution of ICM. We design TARIS, a novel distributed graph platform that implements these incremental computing features. We use efficient data structures to reduce memory access and enhance locality during graph updates. We also propose scheduling strategies to interleave updates with computing, and streaming strategies to adapt the execution window for incremental computing to the variable input rates. Our detailed and rigorous evaluation of temporal algorithms on large-scale graphs with up to \u0000<inline-formula><tex-math>$2,text{B}$</tex-math></inline-formula>\u0000 edges show that TARIS out-performs contemporary baselines, Tink and Gradoop, by 3–4 orders of magnitude, and handles a high input rate of \u0000<inline-formula><tex-math>$ 83k$</tex-math></inline-formula>\u0000–\u0000<inline-formula><tex-math>$ 587,text{M}$</tex-math></inline-formula>\u0000 Mutations/s with latencies in the order of seconds–minutes.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2527-2544"},"PeriodicalIF":5.6,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach 无基础设施物联网网络的分布式任务处理平台:多维优化方法
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-27 DOI: 10.1109/TPDS.2024.3469545
Qiushi Zheng;Jiong Jin;Zhishu Shen;Libing Wu;Iftekhar Ahmad;Yong Xiang
With the rapid development of artificial intelligence (AI) and the Internet of Things (IoT), intelligent information services have showcased unprecedented capabilities in acquiring and analysing information. The conventional task processing platforms rely on centralised Cloud processing, which encounters challenges in infrastructure-less environments with unstable or disrupted electrical grids and cellular networks. These challenges hinder the deployment of intelligent information services in such environments. To address these challenges, we propose a distributed task processing platform (${DTPP}$) designed to provide satisfactory performance for executing computationally intensive applications in infrastructure-less environments. This platform leverages numerous distributed homogeneous nodes to process the arriving task locally or collaboratively. Based on this platform, a distributed task allocation algorithm is developed to achieve high task processing performance with limited energy and bandwidth resources. To validate our approach, ${DTPP}$ has been tested in an experimental environment utilising real-world experimental data to simulate IoT network services in infrastructure-less environments. Extensive experiments demonstrate that our proposed solution surpasses comparative algorithms in key performance metrics, including task processing ratio, task processing accuracy, algorithm processing time, and energy consumption.
随着人工智能(AI)和物联网(IoT)的快速发展,智能信息服务在获取和分析信息方面展现出前所未有的能力。传统的任务处理平台依赖于集中式云处理,这在电网和蜂窝网络不稳定或中断的无基础设施环境中遇到了挑战。这些挑战阻碍了智能信息服务在此类环境中的部署。为应对这些挑战,我们提出了分布式任务处理平台({DTPP}$),旨在为在无基础设施环境中执行计算密集型应用提供令人满意的性能。该平台利用众多分布式同构节点,对到达的任务进行本地或协作处理。基于该平台,我们开发了一种分布式任务分配算法,以便在能源和带宽资源有限的情况下实现较高的任务处理性能。为了验证我们的方法,我们在实验环境中测试了 ${DTPP}$,利用真实世界的实验数据来模拟无基础设施环境中的物联网网络服务。广泛的实验证明,我们提出的解决方案在关键性能指标(包括任务处理率、任务处理准确性、算法处理时间和能耗)上超越了同类算法。
{"title":"Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach","authors":"Qiushi Zheng;Jiong Jin;Zhishu Shen;Libing Wu;Iftekhar Ahmad;Yong Xiang","doi":"10.1109/TPDS.2024.3469545","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3469545","url":null,"abstract":"With the rapid development of artificial intelligence (AI) and the Internet of Things (IoT), intelligent information services have showcased unprecedented capabilities in acquiring and analysing information. The conventional task processing platforms rely on centralised Cloud processing, which encounters challenges in infrastructure-less environments with unstable or disrupted electrical grids and cellular networks. These challenges hinder the deployment of intelligent information services in such environments. To address these challenges, we propose a distributed task processing platform (\u0000<inline-formula><tex-math>${DTPP}$</tex-math></inline-formula>\u0000) designed to provide satisfactory performance for executing computationally intensive applications in infrastructure-less environments. This platform leverages numerous distributed homogeneous nodes to process the arriving task locally or collaboratively. Based on this platform, a distributed task allocation algorithm is developed to achieve high task processing performance with limited energy and bandwidth resources. To validate our approach, \u0000<inline-formula><tex-math>${DTPP}$</tex-math></inline-formula>\u0000 has been tested in an experimental environment utilising real-world experimental data to simulate IoT network services in infrastructure-less environments. Extensive experiments demonstrate that our proposed solution surpasses comparative algorithms in key performance metrics, including task processing ratio, task processing accuracy, algorithm processing time, and energy consumption.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2392-2404"},"PeriodicalIF":5.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GeoDeploy: Geo-Distributed Application Deployment Using Benchmarking GeoDeploy:利用基准测试进行地理分布式应用部署
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-27 DOI: 10.1109/TPDS.2024.3470532
Devki Nandan Jha;Yinhao Li;Zhenyu Wen;Graham Morgan;Prem Prakash Jayaraman;Maciej Koutny;Omer F. Rana;Rajiv Ranjan
Geo-distributed web-applications (GWA) can be deployed across multiple geographically separated datacenters to reduce the latency of access for users. Finding a suitable deployment for a GWA is challenging due to the requirement to consider a number of different parameters, such as host configurations across a federated infrastructure. The ability to evaluate multiple deployment configurations enables an efficient outcome to be determined, balancing resource usage while satisfying user requirements. We propose GeoDeploy, a framework designed for finding a deployment solution for GWA. We evaluate GeoDeploy using both a formal algorithmic model and a practical cloud-based deployment. We also compare our approach with other existing techniques.
地理分布式网络应用程序(GWA)可以部署在多个地理位置分离的数据中心,以减少用户访问的延迟。由于需要考虑许多不同的参数,例如联合基础设施中的主机配置,因此为 GWA 找到合适的部署方式具有挑战性。评估多种部署配置的能力可以确定有效的结果,在满足用户需求的同时平衡资源使用。我们提出的 GeoDeploy 是一个旨在为 GWA 寻找部署解决方案的框架。我们使用正式算法模型和基于云的实际部署对 GeoDeploy 进行了评估。我们还将我们的方法与其他现有技术进行了比较。
{"title":"GeoDeploy: Geo-Distributed Application Deployment Using Benchmarking","authors":"Devki Nandan Jha;Yinhao Li;Zhenyu Wen;Graham Morgan;Prem Prakash Jayaraman;Maciej Koutny;Omer F. Rana;Rajiv Ranjan","doi":"10.1109/TPDS.2024.3470532","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3470532","url":null,"abstract":"Geo-distributed web-applications (GWA) can be deployed across multiple geographically separated datacenters to reduce the latency of access for users. Finding a suitable deployment for a GWA is challenging due to the requirement to consider a number of different parameters, such as host configurations across a federated infrastructure. The ability to evaluate multiple deployment configurations enables an efficient outcome to be determined, balancing resource usage while satisfying user requirements. We propose \u0000<sc>GeoDeploy</small>\u0000, a framework designed for finding a deployment solution for GWA. We evaluate \u0000<sc>GeoDeploy</small>\u0000 using both a formal algorithmic model and a practical cloud-based deployment. We also compare our approach with other existing techniques.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2361-2374"},"PeriodicalIF":5.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Distributed Edge Computing for Dependent Delay-Sensitive Tasks in Multi-Operator Multi-Access Networks 在多运营商多接入网络中针对依赖性延迟敏感任务的高效分布式边缘计算
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-26 DOI: 10.1109/TPDS.2024.3468892
Alia Asheralieva;Dusit Niyato;Xuetao Wei
We study the problem of distributed computing in the multi-operator multi-access edge computing (MEC) network for dependent tasks. Every task comprises several sub-tasks which are executed based on logical precedence modelled as a directed acyclic graph. In the graph, each vertex is a sub-task, each edge – precedence constraint, such that a sub-task can only be started after all its preceding sub-tasks are completed. Tasks are executed by MEC servers with the assistance of nearby edge devices, so that the MEC network can be viewed as a distributed “primary-secondary node” system where each MEC server acts as a primary node (PN) deciding on sub-tasks assigned to its secondary nodes (SNs), i.e., nearby edge devices. The PN's decision problem is complex, as its SNs can be associated with other neighboring PNs. In this case, the available processing resources of SNs depend on the sub-task assignment decisions of all neighboring PNs. Since PNs are controlled by different operators, they do not coordinate their decisions, and each PN is uncertain about the sub-task assignments of its neighbors (and, thus, the available resources of its SNs). To address this problem, we propose a novel framework based on a graphical Bayesian game, where PNs play under uncertainty about their neighbors’ decisions. We prove that the game has a perfect Bayesian equilibrium (PBE) yielding unique optimal values, and formulate new Bayesian reinforcement learning and Bayesian deep reinforcement learning algorithms enabling each PN to reach the PBE autonomously (without communicating with other PNs).
我们研究了多操作员多访问边缘计算(MEC)网络中依赖任务的分布式计算问题。每个任务都由若干个子任务组成,这些子任务根据逻辑优先级执行,被模拟为有向无环图。在该图中,每个顶点都是一个子任务,每条边都是优先级约束,只有在前面所有子任务都完成后,才能启动一个子任务。任务由 MEC 服务器在附近边缘设备的协助下执行,因此 MEC 网络可视为一个分布式 "主-次节点 "系统,其中每个 MEC 服务器作为主节点 (PN),决定分配给其次节点 (SN)(即附近的边缘设备)的子任务。PN 的决策问题很复杂,因为其 SN 可能与其他相邻的 PN 相关联。在这种情况下,SN 的可用处理资源取决于所有相邻 PN 的子任务分配决策。由于 PN 由不同的操作员控制,它们不会协调其决策,因此每个 PN 都不确定其邻居的子任务分配(因此也不确定其 SN 的可用资源)。为了解决这个问题,我们提出了一个基于图形贝叶斯博弈的新框架,其中 PN 在不确定其邻居决策的情况下进行博弈。我们证明该博弈有一个完美贝叶斯均衡(PBE),它能产生唯一的最优值,并提出了新的贝叶斯强化学习和贝叶斯深度强化学习算法,使每个 PN 都能自主达到 PBE(无需与其他 PN 通信)。
{"title":"Efficient Distributed Edge Computing for Dependent Delay-Sensitive Tasks in Multi-Operator Multi-Access Networks","authors":"Alia Asheralieva;Dusit Niyato;Xuetao Wei","doi":"10.1109/TPDS.2024.3468892","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3468892","url":null,"abstract":"We study the problem of distributed computing in the \u0000<italic>multi-operator multi-access edge computing</i>\u0000 (MEC) network for \u0000<italic>dependent tasks</i>\u0000. Every task comprises several \u0000<italic>sub-tasks</i>\u0000 which are executed based on logical precedence modelled as a \u0000<italic>directed acyclic graph</i>\u0000. In the graph, each vertex is a sub-task, each edge – precedence constraint, such that a sub-task can only be started after all its preceding sub-tasks are completed. Tasks are executed by MEC servers with the assistance of nearby edge devices, so that the MEC network can be viewed as a \u0000<italic>distributed</i>\u0000 “\u0000<italic>primary-secondary node</i>\u0000” system where each MEC server acts as a \u0000<italic>primary node</i>\u0000 (PN) deciding on sub-tasks assigned to its \u0000<italic>secondary nodes</i>\u0000 (SNs), i.e., nearby edge devices. The PN's decision problem is complex, as its SNs can be associated with other \u0000<italic>neighboring</i>\u0000 PNs. In this case, the available processing resources of SNs depend on the sub-task assignment decisions of all neighboring PNs. Since PNs are controlled by different operators, they do not coordinate their decisions, and each PN is uncertain about the sub-task assignments of its neighbors (and, thus, the available resources of its SNs). To address this problem, we propose a novel framework based on a \u0000<italic>graphical Bayesian game</i>\u0000, where PNs play under uncertainty about their neighbors’ decisions. We prove that the game has a \u0000<italic>perfect Bayesian equilibrium</i>\u0000 (PBE) yielding \u0000<italic>unique optimal values</i>\u0000, and formulate new \u0000<italic>Bayesian reinforcement learning</i>\u0000 and \u0000<italic>Bayesian deep reinforcement learning</i>\u0000 algorithms enabling each PN to reach the PBE autonomously (without communicating with other PNs).","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2559-2577"},"PeriodicalIF":5.6,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1