Accurate prediction of air pollutant concentrations, specifically concerning inhalable particulate matter such as PM2.5, is crucial for proactive measures to safeguard the well-being of urban residents. This paper focuses on addressing the perceptible latency effect for long-term PM2.5 predictions produced by existing statistical models. We emphasize the importance of numerical computations in capturing substantial changes, and enhance prediction accuracy by integrating them with high-dimensional, diverse urban data. Specifically, our approach collects data from a global-to-meso-scale atmospheric dispersion model named System for Integrated modeLling of Atmospheric coMposition (SILAM), along with numerical weather forecasts, traffic congestion measurement, meteorological factors and static sources (road network and points of interest). We find that existing deep learning models are prone to overfitting when applied to complex datasets, primarily due to their uniform treatment of diverse data types as time series without adapting to the specific characteristics of each data type. To counter this, we propose a simple yet transferable deep learning architecture, focusing on the proper use of various data types. Additionally, our comparative analysis, through a case study in Shenzhen, China, shows our model not only enhances SILAM dispersion accuracy for 24h-ahead PM2.5 forecasts by a significant 30.3%, but also mitigates the noticeable latency effect of existing models by 19.5%. Finally, an ablation study further validates the importance of each data source and module of our approach.