Efficient resource management and accurate prediction of cloud workloads are vital in modern cloud computing environments, where dynamic and volatile workloads present significant challenges. Traditional forecasting models often fail to fully capture the intricate temporal dependencies and non-linear patterns inherent in cloud data, leading to inefficiencies in resource utilization. To overcome these limitations, this research introduces the MultiLayer Multivariate Resource Predictor (MMRP), a novel deep learning architecture that seamlessly integrates a Multi-Head Attention Transformer model with Convolutional Neural Networks and Bidirectional Long Short-Term Memory units. The proposed model is designed to excel in capturing long-range dependencies and complex patterns, thereby significantly enhancing the accuracy of workload predictions. Extensive, rigorous experimentation using real-world Alibaba and Google cluster traces reveals that the proposed model consistently outperforms existing state-of-the-art models and related cloud resource utilization prediction in both univariate and multivariate time series forecasting tasks. The model demonstrates a remarkable improvement in prediction performance, with an average R squared increase of 5.76% and a Mean Absolute Percentage Error reduction of 84.9% compared to the best-performing baseline models. Furthermore, our model achieves a significant reduction in Root Mean Square Error by approximately 35.34% and decreases Mean Absolute Error by about 39.49% on average. Its scalability and adaptability across various cloud environments underscore the proposed model’s potential to optimize resource allocation, paving the way for more efficient and reliable cloud-based systems.