Blockchain Sharding Over Wireless Channels: Dynamic Resource Allocation With Sparse Reward

IF 5.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Wireless Communications Letters Pub Date : 2024-11-04 DOI:10.1109/LWC.2024.3487307

Zhangjun Ren;Long Shi;Zhe Wang;Jun Li;Zehui Xiong

{"title":"Blockchain Sharding Over Wireless Channels: Dynamic Resource Allocation With Sparse Reward","authors":"Zhangjun Ren;Long Shi;Zhe Wang;Jun Li;Zehui Xiong","doi":"10.1109/LWC.2024.3487307","DOIUrl":null,"url":null,"abstract":"Blockchain sharding over wireless channels (BSoW) is a promising technology to enhance the throughput of blockchain systems by parallelly processing the transactions over multiple shards. Nonetheless, the time-varying characteristic of wireless fading channels may result in unequal transaction processing rates across different shards, which limits the overall system delay. In this letter, we formulate the dynamic resource allocation as a Markov decision process (MDP) that jointly optimizes the bandwidth allocation and block sizes, aiming to minimize the cumulative transaction completion latency for the BSoW network. Each transaction undergoes a two-stage consensus, where it first waits in the transaction queue for member-PBFT consensus, and then waits in the block queue for final-PBFT consensus before being appended to the main chain. Therefore, the reward feedback from the environment is delayed and sparse. Due to the lack of intermediate feedback, the conventional temporal-difference based reinforcement learning algorithms such as Proximal Policy Optimization (PPO) may not be able to identify the impact of each action on the final reward, which suffers from low learning efficiency. To address this issue, we propose an Hindsight Distribution Correction Estimation (HDICE)-PPO algorithm to effectively learn the credit assignment of sparse reward over intermediate actions for the policy optimization. The simulation results show that the proposed algorithm reduces the average transaction latency by 46% compared with Hindsight Credit Assignment (HCA)-PPO algorithm, and by 72.4% compared with PPO algorithm.","PeriodicalId":13343,"journal":{"name":"IEEE Wireless Communications Letters","volume":"14 1","pages":"63-67"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Wireless Communications Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10742407/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Blockchain sharding over wireless channels (BSoW) is a promising technology to enhance the throughput of blockchain systems by parallelly processing the transactions over multiple shards. Nonetheless, the time-varying characteristic of wireless fading channels may result in unequal transaction processing rates across different shards, which limits the overall system delay. In this letter, we formulate the dynamic resource allocation as a Markov decision process (MDP) that jointly optimizes the bandwidth allocation and block sizes, aiming to minimize the cumulative transaction completion latency for the BSoW network. Each transaction undergoes a two-stage consensus, where it first waits in the transaction queue for member-PBFT consensus, and then waits in the block queue for final-PBFT consensus before being appended to the main chain. Therefore, the reward feedback from the environment is delayed and sparse. Due to the lack of intermediate feedback, the conventional temporal-difference based reinforcement learning algorithms such as Proximal Policy Optimization (PPO) may not be able to identify the impact of each action on the final reward, which suffers from low learning efficiency. To address this issue, we propose an Hindsight Distribution Correction Estimation (HDICE)-PPO algorithm to effectively learn the credit assignment of sparse reward over intermediate actions for the policy optimization. The simulation results show that the proposed algorithm reduces the average transaction latency by 46% compared with Hindsight Credit Assignment (HCA)-PPO algorithm, and by 72.4% compared with PPO algorithm.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

无线信道上的区块链分片：具有稀疏奖励的动态资源分配

区块链无线通道分片（BSoW）是一种很有前途的技术，它通过在多个分片上并行处理事务来提高区块链系统的吞吐量。然而，无线衰落信道的时变特性可能导致不同分片之间的事务处理速率不相等，从而限制了整个系统的延迟。在这封信中，我们将动态资源分配制定为马尔科夫决策过程（MDP），共同优化带宽分配和块大小，旨在最小化BSoW网络的累积事务完成延迟。每笔交易都经历两个阶段的共识，首先在交易队列中等待成员pbft共识，然后在块队列中等待最终pbft共识，然后再添加到主链中。因此，来自环境的奖励反馈是延迟的和稀疏的。由于缺乏中间反馈，传统的基于时间差的强化学习算法（如Proximal Policy Optimization， PPO）可能无法识别每个动作对最终奖励的影响，从而导致学习效率低下。为了解决这一问题，我们提出了一种后见之明分布校正估计(HDICE)-PPO算法，以有效地学习稀疏奖励对中间行为的信用分配，用于策略优化。仿真结果表明，该算法比后见之明信用分配(HCA)-PPO算法平均事务延迟降低46%，比PPO算法平均事务延迟降低72.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Wireless Communications Letters Engineering-Electrical and Electronic Engineering

CiteScore

12.30

自引率

6.30%

发文量

481

期刊介绍： IEEE Wireless Communications Letters publishes short papers in a rapid publication cycle on advances in the state-of-the-art of wireless communications. Both theoretical contributions (including new techniques, concepts, and analyses) and practical contributions (including system experiments and prototypes, and new applications) are encouraged. This journal focuses on the physical layer and the link layer of wireless communication systems.