物联网网络中基于区块链的联盟学习的新型资源管理框架

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Sustainable Computing Pub Date : 2024-01-26 DOI:10.1109/TSUSC.2024.3358915

Aman Mishra;Yash Garg;Om Jee Pandey;Mahendra K. Shukla;Athanasios V. Vasilakos;Rajesh M. Hegde

{"title":"物联网网络中基于区块链的联盟学习的新型资源管理框架","authors":"Aman Mishra;Yash Garg;Om Jee Pandey;Mahendra K. Shukla;Athanasios V. Vasilakos;Rajesh M. Hegde","doi":"10.1109/TSUSC.2024.3358915","DOIUrl":null,"url":null,"abstract":"At present, the centralized learning models, used for IoT applications generating large amount of data, face several challenges such as bandwidth scarcity, more energy consumption, increased uses of computing resources, poor connectivity, high computational complexity, reduced privacy, and large latency towards data transfer. In order to address the aforementioned challenges, Blockchain-Enabled Federated Learning Networks (BFLNs) emerged recently, which deal with trained model parameters only, rather than raw data. BFLNs provide enhanced security along with improved energy-efficiency and Quality-of-Service (QoS). However, BFLNs suffer with the challenges of exponential increased action space in deciding various parameter levels towards training and block generation. Motivated by aforementioned challenges of BFLNs, in this work, we are proposing an actor-critic Reinforcement Learning (RL) method to model the Machine Learning Model Owner (MLMO) in selecting the optimal set of parameter levels, addressing the challenges of exponential grow of action space in BFLNs. Further, due to the implicit entropy exploration, actor-critic RL method balances the exploration-exploitation trade-off and shows better performance than most off-policy methods, on large discrete action spaces. Therefore, in this work, considering the mobile scenario of the devices, MLMO decides the data and energy levels that the mobile devices use for the training and determine the block generation rate. This leads to minimized system latency and reduced overall cost, while achieving the target accuracy. Specifically, we have used Proximal Policy Optimization (PPO) as an on-policy actor-critic method with it's two variants, one based on Monte Carlo (MC) returns and another based on Generalized Advantage Estimate (GAE). We analyzed that PPO has better exploration and sample efficiency, lesser training time, and consistently higher cumulative rewards, when compared to off-policy Deep Q-Network (DQN).","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"9 4","pages":"648-660"},"PeriodicalIF":3.0000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Resource Management Framework for Blockchain-Based Federated Learning in IoT Networks\",\"authors\":\"Aman Mishra;Yash Garg;Om Jee Pandey;Mahendra K. Shukla;Athanasios V. Vasilakos;Rajesh M. Hegde\",\"doi\":\"10.1109/TSUSC.2024.3358915\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, the centralized learning models, used for IoT applications generating large amount of data, face several challenges such as bandwidth scarcity, more energy consumption, increased uses of computing resources, poor connectivity, high computational complexity, reduced privacy, and large latency towards data transfer. In order to address the aforementioned challenges, Blockchain-Enabled Federated Learning Networks (BFLNs) emerged recently, which deal with trained model parameters only, rather than raw data. BFLNs provide enhanced security along with improved energy-efficiency and Quality-of-Service (QoS). However, BFLNs suffer with the challenges of exponential increased action space in deciding various parameter levels towards training and block generation. Motivated by aforementioned challenges of BFLNs, in this work, we are proposing an actor-critic Reinforcement Learning (RL) method to model the Machine Learning Model Owner (MLMO) in selecting the optimal set of parameter levels, addressing the challenges of exponential grow of action space in BFLNs. Further, due to the implicit entropy exploration, actor-critic RL method balances the exploration-exploitation trade-off and shows better performance than most off-policy methods, on large discrete action spaces. Therefore, in this work, considering the mobile scenario of the devices, MLMO decides the data and energy levels that the mobile devices use for the training and determine the block generation rate. This leads to minimized system latency and reduced overall cost, while achieving the target accuracy. Specifically, we have used Proximal Policy Optimization (PPO) as an on-policy actor-critic method with it's two variants, one based on Monte Carlo (MC) returns and another based on Generalized Advantage Estimate (GAE). We analyzed that PPO has better exploration and sample efficiency, lesser training time, and consistently higher cumulative rewards, when compared to off-policy Deep Q-Network (DQN).\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"9 4\",\"pages\":\"648-660\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10415198/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10415198/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

目前，用于产生大量数据的物联网应用的集中式学习模型面临着一些挑战，如带宽稀缺、能耗增加、计算资源使用增多、连接性差、计算复杂度高、隐私性降低以及数据传输延迟大等。为了应对上述挑战，最近出现了区块链联合学习网络（Blockchain-Enabled Federated Learning Networks，BFLNs），它只处理经过训练的模型参数，而不是原始数据。BFLNs 在提高能效和服务质量（QoS）的同时，还增强了安全性。然而，BFLNs 在决定训练和区块生成的各种参数水平时，面临着行动空间呈指数级增长的挑战。受 BFLNs 面临的上述挑战的启发，在这项工作中，我们提出了一种行为批判强化学习（RL）方法，以模拟机器学习模型所有者（MLMO）选择最佳参数水平集的过程，从而解决 BFLNs 行动空间呈指数增长的挑战。此外，由于隐式熵探索，演员批判 RL 方法平衡了探索与开发之间的权衡，在大型离散行动空间上比大多数非策略方法表现出更好的性能。因此，在这项工作中，考虑到设备的移动场景，MLMO 决定了移动设备用于训练的数据和能量水平，并决定了区块生成率。这样就能在实现目标精度的同时，最大限度地减少系统延迟，降低总体成本。具体来说，我们使用了近端策略优化（PPO）作为策略上的行为者批判方法，它有两种变体，一种基于蒙特卡罗（MC）回报，另一种基于广义优势估计（GAE）。我们分析发现，与非政策深度 Q 网络（DQN）相比，PPO 具有更好的探索和采样效率、更少的训练时间和持续更高的累积奖励。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Novel Resource Management Framework for Blockchain-Based Federated Learning in IoT Networks

At present, the centralized learning models, used for IoT applications generating large amount of data, face several challenges such as bandwidth scarcity, more energy consumption, increased uses of computing resources, poor connectivity, high computational complexity, reduced privacy, and large latency towards data transfer. In order to address the aforementioned challenges, Blockchain-Enabled Federated Learning Networks (BFLNs) emerged recently, which deal with trained model parameters only, rather than raw data. BFLNs provide enhanced security along with improved energy-efficiency and Quality-of-Service (QoS). However, BFLNs suffer with the challenges of exponential increased action space in deciding various parameter levels towards training and block generation. Motivated by aforementioned challenges of BFLNs, in this work, we are proposing an actor-critic Reinforcement Learning (RL) method to model the Machine Learning Model Owner (MLMO) in selecting the optimal set of parameter levels, addressing the challenges of exponential grow of action space in BFLNs. Further, due to the implicit entropy exploration, actor-critic RL method balances the exploration-exploitation trade-off and shows better performance than most off-policy methods, on large discrete action spaces. Therefore, in this work, considering the mobile scenario of the devices, MLMO decides the data and energy levels that the mobile devices use for the training and determine the block generation rate. This leads to minimized system latency and reduced overall cost, while achieving the target accuracy. Specifically, we have used Proximal Policy Optimization (PPO) as an on-policy actor-critic method with it's two variants, one based on Monte Carlo (MC) returns and another based on Generalized Advantage Estimate (GAE). We analyzed that PPO has better exploration and sample efficiency, lesser training time, and consistently higher cumulative rewards, when compared to off-policy Deep Q-Network (DQN).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊