Background
Artificial intelligence (AI) is increasingly applied in medicine, yet its clinical integration in hand surgery remains variable and incompletely validated. This systematic review and meta-analysis evaluated current AI applications in hand surgery and benchmarked performance against human comparators where available.
Methods
Following PRISMA 2020 guidelines, PubMed/MEDLINE, Embase, Web of Science, and the Cochrane Library were searched through October 2025. Eligible studies evaluated AI systems in hand or wrist surgery with reported performance metrics. Outcomes included diagnostic accuracy, prognostic discrimination, concordance with clinical recommendations, workflow impact, and user satisfaction. Meta-analysis using a bivariate random-effects model was performed when ≥3 comparable studies were available and was restricted to radiograph-based fracture detection (distal radius and scaphoid). All other applications were synthesized narratively due to heterogeneity. The protocol was registered with PROSPERO (CRD420251230505).
Results
Of 1228 screened records, 98 studies met inclusion criteria, most addressing diagnostic imaging. For distal radius fractures, pooled AI sensitivity and specificity were 92 % and 89 %, compared with 95 % and 94 % for human readers. For scaphoid fractures, AI demonstrated higher sensitivity (85 % vs. 71 %) but lower specificity (83 % vs. 93 %). Prognostic machine-learning models outperformed clinician estimates in selected retrospective cohorts (mean accuracy 78 % vs. 65 %), although calibration and external validation were inconsistently reported. Large language models demonstrated feasibility in simulated settings, achieving passing specialty-exam scores and generating high-quality documentation (mean satisfaction 4.6/5), while showing high sensitivity but variable specificity in treatment recommendations. Robotic and instrument-tracking applications remain experimental.
Conclusions
AI demonstrates promise in selected hand-surgery tasks, particularly fracture detection, outcome prediction, and documentation support. However, evidence is predominantly retrospective and single-center. Prospective multicenter validation and careful attention to bias, transparency, and ethical safeguards are required before routine clinical adoption. AI should augment—not replace—clinical expertise.
Level of evidence
II (systematic review/meta-analysis of predominantly Level II–III studies).
扫码关注我们
求助内容:
应助结果提醒方式:
