Background: Antimicrobial resistance (AMR) poses a global threat to public health, though AI models have shown transformative potential in combating AMR. China's DeepSeek, a novel open-source, low-cost, and locally deployable AI model, is increasingly integrated into clinical workflows for infectious diseases, yet the pharmacological validity and real-world impact of its recommended drugs remain poorly understood.
Objective: This study aimed to compare the antibacterial regimens among DeepSeek(V3,R1,R1 + WS), ChatGPT o1, and infectious disease (ID) specialists, while evaluating the performance, timeliness of the two AI models.
Methods: A retrospective analysis was conducted on 101 cases with effective antibacterial therapy. DeepSeek and ChatGPT o1 were identically prompted using comprehensive case data to generate antibacterial regimens. Then, Five independent clinical pharmacists evaluated all outputs. The benchmark evaluation metrics included the concordance rate between two AI models and the patient's effective antibacterial regimens, as well as the proportion of regimens escalating therapy to higher-tier groups per WHO's AWaRe classification. Furthermore, performance metrics encompassed overlap rate, precision, recall, F1-score, ID specialists endorsement rate, search latency, and search success rate of DeepSeek and ChatGPT o1. Statistical analyses employed Chi-square and Kruskal-Wallis tests.
Results: DeepSeek-V3 demonstrated the highest overall concordance rate with ID specialists, exceeding those of DeepSeek-R1 and ChatGPT o1. The proportion of antibacterial regimens escalated to higher-tier groups was significantly greater in DeepSeek-R1, DeepSeek-R1 + WS and ChatGPT o1 compared to that of ID specialists(P < 0.005). Regarding the performance metrics, ChatGPT o1 achieved the highest level of overlap rate, while DeepSeek-R1 led in recall. Furthermore, DeepSeek-V3 achieved the optimal F1-score and the highest overall ID specialists' endorsed rate, reflecting optimal balance. Likewise, Search latency varied substantially (H = 305.53, P < 0.005), with DeepSeek-V3 and ChatGPT o1 exhibiting the fastest response times.
Conclusions: While moderate agreement exists between DeepSeek and ID specialists in antibiotic selection, DeepSeek models exhibit a marked tendency toward recommending higher-tier, broader-spectrum antibacterials. Moreover, DeepSeek's antibacterial clinical decision-making is comparable to that of ChatGPT o1, with DeepSeek-V3 surpassing it in certain performance metrics. These findings highlight the need for AI refinement to align with stewardship principles and contextual clinical judgment.
Trial registration: Chinese Clinical Trial Registry(ChiCTR2500100661), Registration date: April 14,2025,https://www.chictr.org.cn.
扫码关注我们
求助内容:
应助结果提醒方式:
