Objective: To compare the clinical decision-support performance of 2 large language models (LLMs), ChatGPT-5 and ChatGPT-5 Thinking, with that of experienced clinicians and novices in veterinary theriogenology.
Methods: 15 standardized obstetric and gynecologic scenarios were independently evaluated by 2 expert clinicians, 2 novice veterinarians, and both LLMs under matched, cold-start conditions. Responses were assessed with a 5-point global quality score by a blinded expert panel.
Results: ChatGPT-5 Thinking achieved the highest overall quality ratings, followed by ChatGPT-5 and the expert clinicians. Novice veterinarians received the lowest scores. Responses generated by LLM were generally more consistent and complete than those of human readers.
Conclusions: Within the constraints of a simulated scenario design, LLMs, particularly ChatGPT-5 Thinking, provided clinically appropriate guidance that exceeded novice performance and approached that of expert clinicians. These findings support the potential role of LLMs as adjunct decision-support tools in time-sensitive obstetric and gynecologic cases.
Clinical relevance: LLMs may assist clinicians and trainees in managing reproductive emergencies by offering rapid, structured, guideline-aligned recommendations. Further evaluation in real clinical settings is warranted.
扫码关注我们
求助内容:
应助结果提醒方式:
