Objectives: To develop and evaluate JADE, a proof-of-concept retrieval-augmented generation (RAG) diagnostic assistive system, designed to enhance large language model (LLM) reasoning for jawbone lesion assessment. This study examined whether RAG improves diagnostic accuracy and stability compared with standalone LLMs and ORAD, a supervised learning-based system.
Methods: JADE was developed as a cloud-based RAG system integrating an expert-curated oral radiology database embedded using text-embedding-3-large and indexed in Qdrant Cloud. Structured clinical inputs were encoded as prioritized vector queries. Hybrid semantic and keyword-based retrieval appended relevant evidence to prompts for differential diagnosis generation across LLM backbones. Performance was evaluated in 25 validation cases and compared with GPT-5, Claude Sonnet 4.5, DeepSeek-R1, Gemini 2.5 Flash, their RAG configurations, and ORAD. Diagnostic accuracy was analysed using Cochran's Q test, with post-hoc McNemar's tests and Bonferroni correction. Intra-model stability and response time were assessed.
Results: RAG-GPT-5 achieved the highest diagnostic accuracy (20/25), followed by RAG-Claude Sonnet 4.5 (18/25), RAG-DeepSeek R1 (17/25), and RAG-Gemini 2.5 Flash (15/25). Standalone models achieved 9-13/25 correct diagnoses, ORAD achieved 17/25. No significant differences were observed among standalone models or RAG-based models and ORAD. A significant improvement was observed for GPT-5 when integrated with RAG (p = 0.002). RAG configurations showed higher intra-model stability, with RAG-GPT-5 achieving mean stability of 0.90 ± 0.11. Mean response times ranged from 3-10 seconds.
Conclusions: JADE improved diagnostic accuracy and stability compared with standalone LLMs, underscoring value of RAG reasoning in jawbone lesion assessment and marking the first RAG application in dentomaxillofacial radiology.
扫码关注我们
求助内容:
应助结果提醒方式:
