Sparse Mobile Crowdsensing (SMCS) provides vital support for wide-range urban sensing by collecting data from only a few sub-regions and inferring data of unperceived sub-regions based on the spatiotemporal relationships of the collected data. However, due to the complex spatiotemporal correlations among perception data, extracting nonlinear spatiotemporal features from sparse data is exceptionally challenging, which is crucial for accurate data inference and future data prediction. Furthermore, existing cell selection methods often overlook the temporal variation of urban sensing data, failing to adequately utilize historical and predicted data, which is crucial for obtaining the optimal subset of perception regions. To address these issues, a deep learning sparse urban sensing scheme based on spatiotemporal correlations is proposed, which comprises data completion, short-term spatiotemporal prediction, and cell selection, aiming to produce high-quality urban sensing maps within budget constraints. Firstly, to handle sparse sensing data, a Spatio-Temporal Deep Matrix Factorization (STDMF) is proposed to accurately recover the current full map. Subsequently, leveraging predicted and completed historical data, this study constructs spatiotemporal states, rewards, and actions for deep reinforcement learning. A cell selection algorithm called Spatio-Temporal Prediction Assisted Dueling Double Deep Q Network (STPA-D3QN) is proposed, which uses spatiotemporal dueling deep Q-network to discern spatiotemporal features both within and across observation states,then identifies optimal choices for specific states. Finally, extensive experimental evaluations conducted on four sensing tasks in air quality monitoring verify the effectiveness of the proposed algorithm.