The proliferation of vehicular networks within intelligent transportation systems (ITS) has significantly increased the demand for efficient and adaptive spectrum resource allocation. Spectrum coordination is challenging due to high vehicle traffic, intensive communication environments and diversified service requirements. These are of particular significance in Vehicle-to-Everything (V2X) communications, where adaptive conditions call out powerful solutions. Multi-agent reinforcement learning (MARL) techniques are promising and have been applied to the management of dynamic spectrum access, but with limitations including overestimated value functions, unsteady policy convergence, and dependence on manual choices of rewards, these techniques have limitations as far as their application in practice. This paper presents a new framework of spectrum management IRL-D3QN, which combines Inverse Reinforcement Learning (IRL) and a Dueling Double Deep Q-Network (D3QN). This algorithm involves a prediction network of rewards on determining intrinsic motivation according to its interplay with environments, eliminating the necessity of a danger of designing rewards manually. This enhances generalization in various situations. The dueling network design contributes to learning that is more stable because it keeps the values of state and values of the action apart. In the meantime, the bias of overestimation is minimized in the case of double q-learning. It has been demonstrated through simulations that IRL-D3QN can support a higher Vehicle to Infrastructure (V2I) transmission rate by 7.94 percent and demonstrate significantly less performance degradation under heavy communication loads than state of the art RL algorithms. Therefore, it will provide a solution to the distribution of dynamic spectrum, which will be scalable and self-sufficient in the next generation of vehicular communication systems.
扫码关注我们
求助内容:
应助结果提醒方式:
