Scene information plays a crucial role in motion control, attitude perception, and path planning for wheeled planetary rovers (WPRs). Terrain recognition is the fundamental component of scene recognition. Due to the rich information, visual sensors are usually used in terrain classification. However, teleoperation delay prevents WPRs from using visual information efficiently. End-to-end learning method of deep learning (DL) that does not need complex image preprocessing was proposed to deal with this issue. This paper first built a terrain dataset (consists of loose sand, bedrock, small rock, large rock, and outcrop) using real Mars images to directly support You Only Look Once (YOLOv5) to test its performance on terrain classification. Because the capability of end-to-end training scheme is positively correlated with dataset, the performance of YOLOv5 can be significantly improved by exploiting orders of magnitude more data. The best combination of hyperparameters and models was achieved by slightly tuning YOLOv5, and data augmentation was also applied to optimize its accuracy. Furthermore, its performance was compared with two other end-to-end network architectures. Deep learning algorithms can be used in the future planetary exploration missions, such as WPRs autonomy improvement, traversability analysis, and avoiding getting trapped.