Purpose
To evaluate and compare the quality, accuracy, understandability, and actionability of responses generated by a health-specific AI chatbot (Health Answers by Pfizer) and a general-purpose AI chatbot (ChatGPT/GPT-5) to ophthalmology-related patient queries.
Methods
We input the top five Google Trends search queries on three leading causes of blindness worldwide — glaucoma, cataract, and age-related macular degeneration — into both chatbots. We evaluated each chatbot response with the Flesch Reading Ease test, Flesch-Kincaid Grade Level, Patient Education Materials Assessment Tool, and DISCERN tool to assess the quality, accuracy, understandability, and actionability of each response.
Results
ChatGPT-5 produced responses that were easier to read (Flesch Reading Ease of 48.1 vs 39.0, p = 0.02) and written at a lower grade level (Flesch-Kincaid Grade Level of 8.9 vs 12.2, p = 0.003). ChatGPT-5 also scored higher for understandability (PEMAT-P understandability scores of 83.8% vs 80.5%, p = 0.024) and information quality (DISCERN scores of 41.3 vs 36.4, p = 0.047). In contrast, Health Answers by Pfizer produced content that was significantly more actionable (PEMAT-P actionability scores of 41.3% vs 23.3%, p = 0.004).
Conclusions
This study highlights the utility of Health Answers by Pfizer in producing content of higher actionability, compared to ChatGPT-5, which produced content of greater understandability, quality, and readability. It is paramount for effective patient education in ophthalmology to improve AI chatbots to balance clarity with actionability, especially given the critical nature of silently progressing diseases like glaucoma.
扫码关注我们
求助内容:
应助结果提醒方式:
