Rapid assessment of building damage levels has become very important and has received considerable attention in structural engineering. Traditional methods for this work involve manual inspection, which is often tedious and time-consuming. Deep learning technology in computer vision has developed rapidly in recent years and has proven its superiority. This paper aims to develop an efficient approach to recognize quick post-earthquake structural damage levels. First, we develop a feature extraction with seven pre-trained CNN models (Xception, InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, DenseNet121, NASNetMobile) on a small dataset of 2000 images. The CNN models are then trained by five fold cross-validation. The performance of the models is compared on a testing set, the MobileNet model demonstrated the best classifier performance with an accuracy of 90.89 %. Second, the Bayesian optimization method and the fine-tuning strategy are used to find the optimal hyperparameters of the MobileNet model. The results revealed that the performance of the MobileNet model increased significantly with an accuracy of 96.11 %. Third, Gradient-weighted class activation mapping (Grad-CAM) is used to highlight crucial regions on structural damage images for CNN’s prediction. Finally, the generalizability of the MobileNet model is improved by training it on an extended dataset of 3600 images. The proposed approach demonstrates the feasibility and potential uses of deep learning in image-based structural damage level recognition.