Background: This letter addresses methodological aspects of a study that evaluated a large language model's responses to frequently asked patient questions regarding meniscus surgery. The original research collected common questions from orthopedic resources, submitted them to the model in separate sessions, and assessed the answers using an adequacy and clarification framework. The purpose of this correspondence is to highlight methodological limitations in the prompting strategy and evaluation procedures that may have influenced the study's findings.
Methods: The study used zero-shot prompting without specifying audience level, communication role, or expected response style. Each response was rated by a single orthopedic specialist using a qualitative scale. This letter reviews these methods and discusses how alternative approaches could enhance validity and reproducibility.
Results: Zero-shot prompts without audience targeting or role-based instructions can lead to general, non-specific outputs rather than patient-focused explanations. Structured prompting techniques, such as defining the audience or providing one-shot or few-shot examples, often improve clarity, consistency, and alignment with patient needs. In addition, assessment by a single rater increases subjective bias, as individual interpretation may influence scoring. Multi-rater evaluation with standardized agreement metrics would provide a more reliable and objective assessment of accuracy.
Conclusions: The study offers useful preliminary insight into the potential of artificial intelligence tools for patient education. However, limitations in prompting design and evaluator methodology restrict the strength of the conclusions. Future studies employing structured prompts and multi-rater assessments may yield more robust and clinically meaningful evidence.

