Accurate reconstruction of a high-resolution 3D volume of the heart is critical for comprehensive cardiac assessments. However, cardiac magnetic resonance (CMR) data is usually acquired as a stack of 2D short-axis (SAX) slices, which suffers from the inter-slice misalignment due to cardiac motion and data sparsity from large gaps between SAX slices. Therefore, we aim to propose an end-to-end deep learning (DL) model to address these two challenges simultaneously, employing specific model components for each challenge. The objective is to reconstruct a high-resolution 3D volume of the heart () from acquired CMR SAX slices (). We define the transformation from to as a sequential process of motion correction and super-resolution. Accordingly, our DL model incorporates two distinct components. The first component conducts motion correction by predicting displacement vectors to re-position each SAX slice accurately. The second component takes the motion-corrected SAX slices from the first component and performs the super-resolution to fill the data gaps. These two components operate in a sequential way, and the entire model is trained end-to-end. Our model significantly reduced inter-slice misalignment from originally 3.330.74 mm to 1.360.63 mm and generated accurate high resolution 3D volumes with Dice of 0.9740.010 for left ventricle (LV) and 0.9380.017 for myocardium in a simulation dataset. When compared to the LAX contours in a real-world dataset, our model achieved Dice of 0.9450.023 for LV and 0.7860.060 for myocardium. In both datasets, our model with specific components for motion correction and super-resolution significantly enhance the performance compared to the model without such design considerations. The codes for our model are available at https://github.com/zhennongchen/CMR_MC_SR_End2End.