MIMO TWAF在线中继选择的顺序经验驱动的上下文强盗策略

Ankit Gupta, M. Sellathurai, T. Ratnarajah
{"title":"MIMO TWAF在线中继选择的顺序经验驱动的上下文强盗策略","authors":"Ankit Gupta, M. Sellathurai, T. Ratnarajah","doi":"10.1109/spawc51304.2022.9834018","DOIUrl":null,"url":null,"abstract":"In this work, we derive a sequential experience-driven contextual bandit (CB)-based policies for online relay selection in multiple-input multiple-output (MIMO) two-way amplify-and-forward (TWAF) relay networks, where the relays are provided with quantized imperfect channel gain information. The proposed CB-based policy acquires information about the optimal relay node by resolving the exploration-versus-exploitation dilemma. In particular, we propose a linear upper confidence bound (LinUCB)-based CB policy, and an adaptive active greedy (AAG)-based CB policy that utilizes active learning heuristics. With simulation results, we show that the proposed CB-based policies can reduce the feedback overhead by a factor of eight and time-cost by 70% while outperforming the best conventional Gram-Schmidt (GS) algorithm.","PeriodicalId":423807,"journal":{"name":"2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Sequential Experience-driven Contextual Bandit Policy for MIMO TWAF Online Relay Selection\",\"authors\":\"Ankit Gupta, M. Sellathurai, T. Ratnarajah\",\"doi\":\"10.1109/spawc51304.2022.9834018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we derive a sequential experience-driven contextual bandit (CB)-based policies for online relay selection in multiple-input multiple-output (MIMO) two-way amplify-and-forward (TWAF) relay networks, where the relays are provided with quantized imperfect channel gain information. The proposed CB-based policy acquires information about the optimal relay node by resolving the exploration-versus-exploitation dilemma. In particular, we propose a linear upper confidence bound (LinUCB)-based CB policy, and an adaptive active greedy (AAG)-based CB policy that utilizes active learning heuristics. With simulation results, we show that the proposed CB-based policies can reduce the feedback overhead by a factor of eight and time-cost by 70% while outperforming the best conventional Gram-Schmidt (GS) algorithm.\",\"PeriodicalId\":423807,\"journal\":{\"name\":\"2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/spawc51304.2022.9834018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/spawc51304.2022.9834018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在这项工作中,我们推导了一种基于顺序经验驱动的上下文强盗(CB)策略,用于多输入多输出(MIMO)双向放大转发(TWAF)中继网络中的在线中继选择,其中中继提供量化的不完美信道增益信息。提出的基于cb的策略通过解决探索与开发的困境来获取最优中继节点的信息。特别地,我们提出了一种基于线性上置信度(LinUCB)的CB策略,以及一种利用主动学习启发式的基于自适应主动贪婪(AAG)的CB策略。通过仿真结果,我们表明所提出的基于cb的策略可以将反馈开销减少8倍,时间成本减少70%,同时优于最佳的传统Gram-Schmidt (GS)算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Sequential Experience-driven Contextual Bandit Policy for MIMO TWAF Online Relay Selection
In this work, we derive a sequential experience-driven contextual bandit (CB)-based policies for online relay selection in multiple-input multiple-output (MIMO) two-way amplify-and-forward (TWAF) relay networks, where the relays are provided with quantized imperfect channel gain information. The proposed CB-based policy acquires information about the optimal relay node by resolving the exploration-versus-exploitation dilemma. In particular, we propose a linear upper confidence bound (LinUCB)-based CB policy, and an adaptive active greedy (AAG)-based CB policy that utilizes active learning heuristics. With simulation results, we show that the proposed CB-based policies can reduce the feedback overhead by a factor of eight and time-cost by 70% while outperforming the best conventional Gram-Schmidt (GS) algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Secure Multi-Antenna Coded Caching Deep Transfer Learning Based Radio Map Estimation for Indoor Wireless Communications A New Outage Probability Bound for IR-HARQ and Its Application to Power Adaptation SPAWC 2022 Cover Page A Sequential Experience-driven Contextual Bandit Policy for MIMO TWAF Online Relay Selection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1