Objectives: Antimicrobial resistance is a critical public health threat. Large language models (LLMs) show great capability for providing health information. This study evaluates the effectiveness of LLMs in providing information on antibiotic use and infection management.
Methods: Using a mixed-method approach, responses to healthcare expert-designed scenarios from ChatGPT 3.5, ChatGPT 4.0, Claude 2.0 and Gemini 1.0, in both Italian and English, were analysed. Computational text analysis assessed readability, lexical diversity and sentiment, while content quality was assessed by three experts via DISCERN tool.
Results: 16 scenarios were developed. A total of 101 outputs and 5454 Likert-scale (1-5) scores were obtained for the analysis. A general positive performance gradient was found from ChatGPT 3.5 and 4.0 to Claude to Gemini. Gemini, although producing only five outputs before self-inhibition, consistently outperformed the other models across almost all metrics, producing more detailed, accessible, varied content and a positive overtone. ChatGPT 4.0 demonstrated the highest lexical diversity. A difference in performance by language was observed. All models showed a median score of 1 (IQR=2) regarding the domain addressing antimicrobial resistance.
Discussion: The study highlights a positive performance gradient towards Gemini, which showed superior content quality, accessibility and contextual awareness, although acknowledging its smaller dataset. Generating appropriate content to address antimicrobial resistance proved challenging.
Conclusions: LLMs offer great promise to provide appropriate medical information. However, they should play a supporting role rather than representing a replacement option for medical professionals, confirming the need for expert oversight and improved artificial intelligence design.
扫码关注我们
求助内容:
应助结果提醒方式:
