Introduction: Artificial intelligence (AI)-powered chatbots, such as ChatGPT-4 and DeepSeek, are increasingly utilized in providing medical information. However, their accuracy, comprehensiveness, and reliability, particularly in specialized fields such as colorectal cancer, remain under-evaluated. This study aimed to compare the performance of ChatGPT-4 and DeepSeek in responding to both community- and expert-oriented questions related to colorectal cancer.
Materials and methods: A total of 30 questions were formulated based on clinical experience, including 15 community-focused and 15 expert-oriented questions. On February 13, 2025, ChatGPT-4 (OpenAI, version 4.0) and DeepSeek-R1 (initial January 2025 release) were queried simultaneously in a single session. Responses were independently evaluated by four colorectal surgery experts for appropriateness (0-100), comprehensiveness (0-100), and reference provision (yes/no). Statistical analyses included Mann-Whitney U and chi-square tests, with significance set at p < 0.05.
Results: ChatGPT-4 and DeepSeek demonstrated comparable appropriateness scores (94.0 vs. 92.25, p > 0.05). In community-oriented questions, ChatGPT-4 showed significantly higher comprehensiveness (median 95.0, interquartile range (IQR) 92-98 vs. 90.0, interquartile range 85-94; p = 0.044). Neither chatbot provided scientific references. Inter-rater agreement ranged from good to moderate, with slightly higher consistency observed for DeepSeek (appropriateness ICC 0.83 vs. 0.81).
Discussion: Both chatbots exhibited distinct strengths and limitations. ChatGPT-4 demonstrated superior comprehensiveness in community-oriented responses, whereas DeepSeek provided slightly more consistent evaluations. The absence of scientific references represents a major limitation, restricting clinical applicability and reliability. Enhancing reference support and response consistency is essential before AI-powered chatbots can be safely integrated into colorectal cancer-related clinical decision-making.
扫码关注我们
求助内容:
应助结果提醒方式:
