Objective
To compare the quality of online information about human papillomavirus (HPV)–associated oropharyngeal cancer generated by a large language model with content retrieved from conventional web search and authoritative guideline-based sources.
Methods
Twenty high-volume patient search queries were identified using global Google Trends data. For each question, responses were obtained from GPT-4 (OpenAI), the highest-ranked non-sponsored Google Search result, and leading governmental or guideline-based websites. Responses were anonymized and evaluated in a blinded manner by seven otolaryngology specialists and ten adult laypersons. Experts assessed accuracy, clarity, completeness, relevance, and usefulness; laypersons rated clarity, trustworthiness, and usefulness. Comparative analyses were performed using Friedman and Bonferroni-corrected Wilcoxon signed-rank tests, with inter-rater agreement estimated using intraclass correlation coefficients (ICC)
Results
ChatGPT-generated responses received higher mean ratings than Google Search across all domains for both rater cohorts (p < 0.001 for all comparisons). Experts rated GPT-4 and guideline-based content similarly for accuracy, completeness, and usefulness, while GPT-4 scored significantly higher for clarity and relevance (p < 0.01). Laypersons rated GPT-4 responses highest across all domains, with median scores of 5 versus 4 for the other sources. Inter-rater agreement was modest for subjective domains.
Conclusion
ChatGPT-generated information on HPV-associated oropharyngeal cancer matched the accuracy and completeness of authoritative guideline-based content and demonstrated significantly greater clarity and relevance, while outperforming conventional web search results. LLMs may help improve accessibility and consistency of online patient education when implemented with expert oversight, transparent sourcing, and ongoing quality monitoring.
扫码关注我们
求助内容:
应助结果提醒方式:
