Introduction: Self-care and self-medication are increasingly viewed as helpful approaches to managing minor ailments; however, patients are often not confident in making informed choices. Pharmacists have traditionally assisted patients in this domain, but the emergence of digital health technologies has transformed the way individuals seek information towards the use of artificial intelligence (AI) tools. ChatGPT-4o mini, Gemini, and Copilot are recently growing popular for health-related guidance. Despite the accessibility and ease of use that these AI tools offer, their accuracy, patient-centeredness, and reliability in supporting self-care remain insufficiently evaluated.
Aims and objectives: The primary objective of this study is to evaluate and compare the performance of ChatGPT-4o mini, Gemini, and Copilot in the context of patient self-care by assessing the accuracy, patient-centeredness, and comprehensiveness of their responses against standard recommendations.
Materials and methods: Ninety-one case scenarios representing the most common minor ailments were introduced to the three AI models to generate responses that were subsequently assessed and compared with established standard recommendations by three of the study investigators. Evaluation of the responses was conducted on their accuracy, patient-centeredness, comprehensiveness, and similarity. An inter-reliability test was also carried out to confirm the consistency between the three evaluators' assessments.
Results: The study findings indicate that ChatGPT-4o mini significantly exceeded Gemini and Copilot in terms of accuracy and presented as mean ± SD (ChatGPT-4o mini: 4.4 ± 0.6, Gemini: 4.1 ± 0.8, Copilot: 3.7 ± 0.7, p < 0.001), patient-centeredness (ChatGPT-4o mini: 4.7 ± 0.6, Gemini: 4.3 ± 1.0, Copilot: 4.2 ± 0.8, p < 0.001), and comprehensiveness (ChatGPT-4o mini: 4.6 ± 0.7, Gemini: 4.2 ± 0.8, Copilot: 3.4 ± 0.7; p < 0.001) among 91 minor ailment case scenarios. Gemini and Copilot showed moderate and low performance, respectively, particularly in complex cases, in contrast to ChatGPT-4o mini. Inter-rater reliability was excellent (Cronbach's alpha ≥ 0.9), confirming assessment consistency. Cosine similarity analysis indicated high overlap between AI and standard recommendations.
Conclusion: This study shows that AI tools are reliable and precise instruments for self-care of mild diseases. These findings highlight ChatGPT-4o mini's superior reliability and patient-centeredness for self-medication guidance, while underscoring the need for human oversight. However, there is a small chance of variation and errors in the AI-generated responses, which may prohibit complete dependence on AI for self-care recommendations.
扫码关注我们
求助内容:
应助结果提醒方式:
