Selecting a suitable dataset to develop a data-based forecasting model is often problematic. This is particularly important in the case of air pollution, where concentration measures are scattered over large areas. On the one hand, the classical approach creates a single-station (local) forecasting model using only the data collected at the same station. This guarantees a training dataset that considers all the site’s specific characteristics. On the other hand, these data may be limited and not sufficient to develop a robust predictor. Thus, one may use data from other stations to complement the dataset or develop a unique model considering all the data available within a region/domain. While this approach may be prone to filtering high variations, it may consider information on peculiar episodes that have not occurred in the past to a specific station. This paper discusses the topic of air pollution forecasting using the example of several stations in the Padana Plain, Northern Italy. Local forecasting models are developed using LSTM neural networks for nitrogen dioxide and ozone and hourly data from 2010 to 2023 and then compared with regional models. All these models perform extremely well under various regression-based and classification-based performance indicators, except for a few sites with peculiar characteristics that can be considered at the border of the information domain.