Introduction
We conducted a cross-sectional, blinded expert evaluation of AI-generated answers to 22 frequently asked pregnancy questions to characterize content quality and potential clinical utility.
Methods
Five obstetricians (not involved in rating) compiled the questions; ChatGPT produced responses using a minimal prompt with a fresh session per item. Forty board-certified OB/GYNs rated each answer on 5-point Likert scales for accuracy, comprehensiveness, safety, and understandability; two deliberately incorrect attention-check items were embedded and excluded.
Results
We obtained 879/880 expected rating blocks (<0.1% missing). Domain means clustered tightly (accuracy 3.95 ± 0.20, safety 3.94 ± 0.16, understandability 3.94 ± 0.19, comprehensiveness 3.91 ± 0.17), with no overall domain difference (Friedman χ2(3) = 3.13, p = 0.372). Question-level means ranged 3.71–4.31, highest for routine daily-life topics (air travel, sexual activity, sleep position, exercise) and lowest for context-dependent items (e.g., (non-stress test) NST 3.71; heartburn 3.72; edema 3.79; vaginal bleeding 3.81). Pre-specified subgroups showed a small but significant difference (Kruskal–Wallis p = 0.033): daily life scored higher than follow-up/testing/procedures (adjusted p < 0.05), whereas daily life vs symptoms and symptoms vs follow-up were not significant. In domain × subgroup analyses, only understandability differed (p = 0.020), with daily life > symptoms (adjusted p = 0.043); safety’s global difference did not yield significant pairwise contrasts. Overall inter-rater reliability was moderate, supporting consistent expert evaluation while underscoring increased variability in symptom-based assessments.
Conclusions
Experts rated the AI-generated answers as moderate-to-high overall; however, inter-rater reliability was only moderate and varied markedly by question type (highest for daily life questions and very low for symptom-related questions) indicating heterogeneous clinician judgments and supporting cautious interpretation of these findings.
扫码关注我们
求助内容:
应助结果提醒方式:
