To achieve an accurate and efficient instance segmentation task for multiple defects within tunnel linings, this paper proposes a simple yet powerful Teacher-Student Framework (TeSF) leveraging the emerging Large Vision Model (LVM) and the advanced You Only Look Once v5 (YOLO v5) model. TeSF integrates a pre-trained LVM within the Teacher Module to alleviate data annotation efforts. Concurrently, the Student Module introduces a novel top-down model architecture, amalgamating YOLO v5 for top-level Classification & Localization and a Segment Head for down-level Segmentation, resulting in YOLO-SH. The Teacher Module acts as a data engine for automatic learning in the Student Module through a well-designed loss function. The proposed TeSF is tested in images collected from Shanghai metro tunnels to automatically recognize five different types of tunnel surface defects. Experiment results indicate that: (1) The LVM-based data annotation procedure in the Teacher Module surpasses the efficacy of the traditional manual method. (2) Optimal equilibrium between computational efficiency and segmentation accuracy is achieved with a medium-sized backbone for YOLO v5, yielding mask [email protected] values of 0.644 and 0.694, all within an inference time of 6.2ms/image. (3) The top-down Student Module with YOLO-SH v5m exhibits superior performance in instance segmentation compared to state-of-the-art models, bringing improvements of no less than 8.2% and 6.3% in box [email protected] and mask [email protected], respectively. In short, the novelty of TeSF lies in the utilization of the pre-trained LVM for streamlined data annotation coupled with the augmentation of YOLO-SH for a more cost-effective and precise detection of multiple defects within tunnels. The applicability of TeSF can extend to the analysis of 3D scanner images derived from in-service tunnel environments.