In order to solve the problem of excessive model parameters and low real-time performance in driver distraction driving detection tasks, this work proposes a detection model based on cascaded convolutional network and attention mechanism. The model adopts a two-stage architecture. In the first stage, the pre-trained MobileNet is used as the backbone network for basic feature extraction to achieve efficient image feature extraction and significantly reduce the computational complexity. In the second stage, the basic features extracted in the first stage are enhanced by combining the Cascaded ResNet structure with the spatial attention mechanism, so as to improve the capture ability of key features. Finally, the features extracted in the two stages are fused to complete the driver's distraction behavior recognition. The experimental results on the public datasets American University in Cairo (AUC) Distracted Driver and StateFarm Distracted Driver (SFD) show that the proposed model achieves the recognition accuracy of 95.72% and 99.87%, respectively, which is significantly better than the existing mainstream methods while maintaining a low number of parameters. The model has low parameter quantity, high detection accuracy and high real-time performance.