Background/Objectives: Accurate identification of dental malocclusions from routine clinical photographs can be time-consuming and subject to interobserver variability. A YOLOv11-based deep learning approach is presented and evaluated for automatic malocclusion detection on routine intraoral photographs, testing the hypothesis that training on a structured annotation protocol enables reliable detection of multiple clinically relevant malocclusions. Methods: An anonymized dataset of 5854 intraoral photographs (frontal occlusion; right/left buccal; maxillary/mandibular occlusal) was labeled according to standardized instructions derived from the Index of Orthodontic Treatment Need (IOTN) A total of 17 clinically relevant classes were annotated with bounding boxes. Due to an insufficient number of examples, two malocclusions (transposition and non-occlusion) were excluded from our quantitative analysis. A YOLOv11 model was trained with augmented data and evaluated on a held-out test set using mean average precision at IoU 0.5 (mAP50), macro precision (macro-P), and macro recall (macro-R). Results: Across 15 analyzed classes, the model achieved 87.8% mAP50, 76.9% macro-P, and 86.1% macro-R. The highest per-class AP50 was observed for Deep bite (98.8%), Diastema (97.9%), Angle Class II canine (97.5%), Anterior open bite (92.8%), Midline shift (91.8%), Angle Class II molar (91.1%), Spacing (91%), and Crowding (90.1%). Moderate performance included Anterior crossbite (88.3%), Angle Class III molar (87.4%), Head bite (82.7%), and Posterior open bite (80.2%). Lower values were seen for Angle Class III canine (76%), Posterior crossbite (75.6%), and Big overjet (75.3%). Precision-recall trends indicate earlier precision drop-off for posterior/transverse classes and comparatively more missed detections in Posterior crossbite, whereas Big overjet exhibited more false positives at the chosen threshold. Conclusion: A YOLOv11-based deep learning system can accurately detect several clinically salient malocclusions on routine intraoral photographs, supporting efficient screening and standardized documentation. Performance gaps align with limited examples and visualization constraints in posterior regions. Larger, multi-center datasets, protocol standardization, quantitative metrics, and multimodal inputs may further improve robustness.
扫码关注我们
求助内容:
应助结果提醒方式:
