Purpose
Salivary gland tumours pose a significant diagnostic challenge due to their histological diversity and overlapping pathological features. Immunohistochemistry (IHC) is a crucial tool in distinguishing these neoplasms, yet selecting the optimal IHC panel remains complex. Artificial intelligence (AI) applications, hold promise in improving diagnostic accuracy by recommending IHC markers. This study evaluates the diagnostic performance of ChatGPT-4 in recommending IHC markers for salivary gland tumours.
Methods
A free version of ChatGPT-4 was used to generate IHC panel recommendations for 21 salivary gland tumour types. A consensus panel of expert pathologists established reference IHC panels, serving as the gold standard for comparison. Chatbot responses were assessed using a structured scoring system measuring accuracy, completeness, and relevance. Cohen’s Kappa analysis quantified agreement between chatbot and pathologist recommendations. Sensitivity, specificity, and F1-scores evaluated diagnostic performance. Consistency across multiple chatbot queries was assessed using repeated measures ANOVA and Bland-Altman analysis. Performance was compared against a rule-based system following strict pathologist guidelines.
Results
The chatbot demonstrated moderate overall agreement (κ=0.53) with pathologists, with higher agreement for benign (κ=0.67) than malignant tumours (κ=0.40). Sensitivity was high (>0.80) for benign tumours but lower for malignant neoplasms. The chatbot frequently recommended unnecessary markers, reducing specificity (<0.50) for several malignancies. Stability analysis revealed variability in chatbot responses, particularly for complex tumour types. ChatGPT underperformed compared to the rule-based system, exhibiting higher rates of incorrect and missed markers.
Conclusion
AI-powered chatbots show potential in aiding IHC marker selection but currently lack the precision required for clinical implementation.
扫码关注我们
求助内容:
应助结果提醒方式:
