{"title":"Blockchain Sharding Over Wireless Channels: Dynamic Resource Allocation With Sparse Reward","authors":"Zhangjun Ren;Long Shi;Zhe Wang;Jun Li;Zehui Xiong","doi":"10.1109/LWC.2024.3487307","DOIUrl":null,"url":null,"abstract":"Blockchain sharding over wireless channels (BSoW) is a promising technology to enhance the throughput of blockchain systems by parallelly processing the transactions over multiple shards. Nonetheless, the time-varying characteristic of wireless fading channels may result in unequal transaction processing rates across different shards, which limits the overall system delay. In this letter, we formulate the dynamic resource allocation as a Markov decision process (MDP) that jointly optimizes the bandwidth allocation and block sizes, aiming to minimize the cumulative transaction completion latency for the BSoW network. Each transaction undergoes a two-stage consensus, where it first waits in the transaction queue for member-PBFT consensus, and then waits in the block queue for final-PBFT consensus before being appended to the main chain. Therefore, the reward feedback from the environment is delayed and sparse. Due to the lack of intermediate feedback, the conventional temporal-difference based reinforcement learning algorithms such as Proximal Policy Optimization (PPO) may not be able to identify the impact of each action on the final reward, which suffers from low learning efficiency. To address this issue, we propose an Hindsight Distribution Correction Estimation (HDICE)-PPO algorithm to effectively learn the credit assignment of sparse reward over intermediate actions for the policy optimization. The simulation results show that the proposed algorithm reduces the average transaction latency by 46% compared with Hindsight Credit Assignment (HCA)-PPO algorithm, and by 72.4% compared with PPO algorithm.","PeriodicalId":13343,"journal":{"name":"IEEE Wireless Communications Letters","volume":"14 1","pages":"63-67"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Wireless Communications Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10742407/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Blockchain sharding over wireless channels (BSoW) is a promising technology to enhance the throughput of blockchain systems by parallelly processing the transactions over multiple shards. Nonetheless, the time-varying characteristic of wireless fading channels may result in unequal transaction processing rates across different shards, which limits the overall system delay. In this letter, we formulate the dynamic resource allocation as a Markov decision process (MDP) that jointly optimizes the bandwidth allocation and block sizes, aiming to minimize the cumulative transaction completion latency for the BSoW network. Each transaction undergoes a two-stage consensus, where it first waits in the transaction queue for member-PBFT consensus, and then waits in the block queue for final-PBFT consensus before being appended to the main chain. Therefore, the reward feedback from the environment is delayed and sparse. Due to the lack of intermediate feedback, the conventional temporal-difference based reinforcement learning algorithms such as Proximal Policy Optimization (PPO) may not be able to identify the impact of each action on the final reward, which suffers from low learning efficiency. To address this issue, we propose an Hindsight Distribution Correction Estimation (HDICE)-PPO algorithm to effectively learn the credit assignment of sparse reward over intermediate actions for the policy optimization. The simulation results show that the proposed algorithm reduces the average transaction latency by 46% compared with Hindsight Credit Assignment (HCA)-PPO algorithm, and by 72.4% compared with PPO algorithm.
期刊介绍:
IEEE Wireless Communications Letters publishes short papers in a rapid publication cycle on advances in the state-of-the-art of wireless communications. Both theoretical contributions (including new techniques, concepts, and analyses) and practical contributions (including system experiments and prototypes, and new applications) are encouraged. This journal focuses on the physical layer and the link layer of wireless communication systems.