Blockchain Sharding Over Wireless Channels: Dynamic Resource Allocation With Sparse Reward

IF 5.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Wireless Communications Letters Pub Date : 2024-11-04 DOI:10.1109/LWC.2024.3487307
Zhangjun Ren;Long Shi;Zhe Wang;Jun Li;Zehui Xiong
{"title":"Blockchain Sharding Over Wireless Channels: Dynamic Resource Allocation With Sparse Reward","authors":"Zhangjun Ren;Long Shi;Zhe Wang;Jun Li;Zehui Xiong","doi":"10.1109/LWC.2024.3487307","DOIUrl":null,"url":null,"abstract":"Blockchain sharding over wireless channels (BSoW) is a promising technology to enhance the throughput of blockchain systems by parallelly processing the transactions over multiple shards. Nonetheless, the time-varying characteristic of wireless fading channels may result in unequal transaction processing rates across different shards, which limits the overall system delay. In this letter, we formulate the dynamic resource allocation as a Markov decision process (MDP) that jointly optimizes the bandwidth allocation and block sizes, aiming to minimize the cumulative transaction completion latency for the BSoW network. Each transaction undergoes a two-stage consensus, where it first waits in the transaction queue for member-PBFT consensus, and then waits in the block queue for final-PBFT consensus before being appended to the main chain. Therefore, the reward feedback from the environment is delayed and sparse. Due to the lack of intermediate feedback, the conventional temporal-difference based reinforcement learning algorithms such as Proximal Policy Optimization (PPO) may not be able to identify the impact of each action on the final reward, which suffers from low learning efficiency. To address this issue, we propose an Hindsight Distribution Correction Estimation (HDICE)-PPO algorithm to effectively learn the credit assignment of sparse reward over intermediate actions for the policy optimization. The simulation results show that the proposed algorithm reduces the average transaction latency by 46% compared with Hindsight Credit Assignment (HCA)-PPO algorithm, and by 72.4% compared with PPO algorithm.","PeriodicalId":13343,"journal":{"name":"IEEE Wireless Communications Letters","volume":"14 1","pages":"63-67"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Wireless Communications Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10742407/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Blockchain sharding over wireless channels (BSoW) is a promising technology to enhance the throughput of blockchain systems by parallelly processing the transactions over multiple shards. Nonetheless, the time-varying characteristic of wireless fading channels may result in unequal transaction processing rates across different shards, which limits the overall system delay. In this letter, we formulate the dynamic resource allocation as a Markov decision process (MDP) that jointly optimizes the bandwidth allocation and block sizes, aiming to minimize the cumulative transaction completion latency for the BSoW network. Each transaction undergoes a two-stage consensus, where it first waits in the transaction queue for member-PBFT consensus, and then waits in the block queue for final-PBFT consensus before being appended to the main chain. Therefore, the reward feedback from the environment is delayed and sparse. Due to the lack of intermediate feedback, the conventional temporal-difference based reinforcement learning algorithms such as Proximal Policy Optimization (PPO) may not be able to identify the impact of each action on the final reward, which suffers from low learning efficiency. To address this issue, we propose an Hindsight Distribution Correction Estimation (HDICE)-PPO algorithm to effectively learn the credit assignment of sparse reward over intermediate actions for the policy optimization. The simulation results show that the proposed algorithm reduces the average transaction latency by 46% compared with Hindsight Credit Assignment (HCA)-PPO algorithm, and by 72.4% compared with PPO algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
无线信道上的区块链分片:具有稀疏奖励的动态资源分配
区块链无线通道分片(BSoW)是一种很有前途的技术,它通过在多个分片上并行处理事务来提高区块链系统的吞吐量。然而,无线衰落信道的时变特性可能导致不同分片之间的事务处理速率不相等,从而限制了整个系统的延迟。在这封信中,我们将动态资源分配制定为马尔科夫决策过程(MDP),共同优化带宽分配和块大小,旨在最小化BSoW网络的累积事务完成延迟。每笔交易都经历两个阶段的共识,首先在交易队列中等待成员pbft共识,然后在块队列中等待最终pbft共识,然后再添加到主链中。因此,来自环境的奖励反馈是延迟的和稀疏的。由于缺乏中间反馈,传统的基于时间差的强化学习算法(如Proximal Policy Optimization, PPO)可能无法识别每个动作对最终奖励的影响,从而导致学习效率低下。为了解决这一问题,我们提出了一种后见之明分布校正估计(HDICE)-PPO算法,以有效地学习稀疏奖励对中间行为的信用分配,用于策略优化。仿真结果表明,该算法比后见之明信用分配(HCA)-PPO算法平均事务延迟降低46%,比PPO算法平均事务延迟降低72.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Wireless Communications Letters
IEEE Wireless Communications Letters Engineering-Electrical and Electronic Engineering
CiteScore
12.30
自引率
6.30%
发文量
481
期刊介绍: IEEE Wireless Communications Letters publishes short papers in a rapid publication cycle on advances in the state-of-the-art of wireless communications. Both theoretical contributions (including new techniques, concepts, and analyses) and practical contributions (including system experiments and prototypes, and new applications) are encouraged. This journal focuses on the physical layer and the link layer of wireless communication systems.
期刊最新文献
Theoretical Analysis of Active STAR-RIS Aided Wireless-Powered NOMA Systems Movable Antennas-aided Wireless Energy Transfer for the Internet of Things User-Adaptive Beam Hopping with Dynamic Beam Footprints in NGSO Satellite Networks Vision-Aided Multi-Stream Hybrid Beamforming for Millimeter Wave MIMO Systems Radar Mutual Information Maximization for Movable Antenna-Enabled ISAC Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1