Introduction: Large language models (LLMs) are increasingly used in healthcare, including urogynecology, where stigma may limit open discussion. LLM-based chat platforms may provide a less intimidating and more accessible way for patients to obtain information, but their reliability requires evaluation. This study compared the quality of ChatGPT-generated responses in urogynecology with those provided by a consultant urogynecologist, focusing on understandability, helpfulness, and reassurance.
Material and methods: A cross-sectional survey was conducted among urogynecology patients. After informed consent, participants reviewed responses to six common questions, each answered by ChatGPT and a single consultant. A blinded third-party consultant verified clinical accuracy. Patients rated responses using a 5-point Likert scale across three domains (maximum score 15 per response). Wilcoxon signed-rank tests were used for comparison.
Results: A total of 203 patients participated (median age 56 years, interquartile range 46-66). ChatGPT responses received higher total ratings than consultant responses (76 [67-85] vs. 72 [63-80], p < 0.01). Scores were higher for understandability, helpfulness, and reassurance (all p < 0.01). ChatGPT was preferred in four of six questions, one showed no difference, and one favored the consultant. Subgroup analyses showed no significant variation based on patient characteristics.
Conclusions: In this exploratory study, women rated ChatGPT's responses as clearer and more reassuring than consultant answers. These findings reflect patient perceptions in a limited setting and should be interpreted with caution. While LLMs may have a supportive role in patient education, their use must remain secondary to expert clinical care and subject to careful oversight.