Advancements in drone technology and high-frequency millimeter-wave communications are transforming unmanned-aerial-vehicles (UAV)-aided networks, expanding their potential across diverse applications. Despite the advantages of broad frequency bandwidth and enhanced line of sight connectivity in the UAV-aided millimeter-wave networks, it is challenging to provide high network performance because of the inherent limitations of limited UAV energy and millimeter-wave’s large path loss. This challenge becomes more important in dynamically changing multi-UAV environments. To address this challenge in multi-UAV networks, we propose a novel approach based on multi-agent deep reinforcement learning called action-branching QMIX. Our method determines nearly optimal codebook-based discrete beamforming vectors and UAV trajectories while maintaining a balance between communication efficiency and energy consumption. The proposed approach employs a new Long Short-Term Memory module to control long sequences effectively and enables it to adapt to changing environmental variables in real-time. We thoroughly evaluate the proposed control with a real-world measurement-based channel model. The evaluation confirms that the proposed control converges stably and consistently, and provides enhanced performance in terms of downlink data rate, success rate of reaching the destination, and service duration when compared to traditional benchmark multi-agent reinforcement learning schemes. These results emphasize the enhanced energy sustainability, robustness, and stability of the proposed approach in dynamically changing multi-UAV environments when compared to the existing benchmark algorithms.