<p>Artificial intelligence (AI) models are increasingly used in clinical practice, including medical education and the dissemination of updated clinical guidelines. In this study, we evaluated four AI tools—ChatGPT-4o, ChatGPT-o1, ChatGPT-o3Mini, and DeepSeek—to assess their ability to summarize the <i>Standards of Care in Diabetes—2025</i> from the American Diabetes Association (ADA) for cardiovascular physicians in primary care settings [<span>1</span>].</p><p>Using a standardized prompt, we compared the AI-generated summaries across 10 key metrics, including accuracy (alignment with ADA 2025 guidelines), completeness (inclusion of core topics such as glycemic targets, blood pressure management, lipid control, and pharmacologic strategies), clarity (readability and conciseness for cardiovascular physicians), clinical relevance (utility for real-world cardiovascular practice), consistency (logical coherence and uniformity in recommendations), evidence support (reference to supporting studies and ADA standards), ethics (neutral and evidence-based recommendations), timeliness (inclusion of the latest ADA updates), actionability (practical guidance for cardiovascular physicians), and fluency (professional language and structure). Each AI tool was rated on a 0–5 scale for each category, yielding a total possible score of 50 points. All summaries were anonymized to remove identifiers. Each model (ChatGPT-4o, ChatGPT-o1, ChatGPT-o3Mini, and DeepSeek) was then tasked with evaluating all four anonymized summaries, including its own output, using the predefined 10 metrics. For each model, the four scores assigned by the evaluators (including self-evaluation) were averaged to calculate the final score per metric.</p><p>Our evaluation showed that ChatGPT-o1 performed best (48.3/50), excelling in completeness (5.0), clinical relevance (5.0), and actionability (5.0), with comprehensive coverage of diabetes screening, cardiovascular risk assessment, hypertension/lipid management, and multidisciplinary collaboration (Table 1). However, its evidence support (4.0) required improvement. ChatGPT-4o (45.5/50) demonstrated strengths in clarity (4.8) and structure but had limitations in timeliness (4.5) and evidence support (3.3), as it failed to incorporate 2025 guideline updates and lacked specific research references. The free models, O3Mini (47.3/50) and DeepSeek (47.3/50), performed comparably to paid tools. O3Mini excelled in consistency (5.0) and CKD/heart failure monitoring, while DeepSeek prioritized concise cardiovascular risk management (clarity: 5.0). Both free models, however, scored lower in completeness (O3Mini: 4.8; DeepSeek: 4.5) and evidence support (O3Mini: 4.0; DeepSeek: 3.8), reflecting insufficient integration of 2025 updates and trial data (Table 1).</p><p>Among the most critical takeaways for cardiovascular physicians were the importance of individualized glycemic targets, the use of SGLT2 inhibitors and GLP-1 receptor agonists for cardiovascu
{"title":"Comparative Analysis of AI Tools for Disseminating ADA 2025 Diabetes Care Standards: Implications for Cardiovascular Physicians","authors":"Tengfei Zheng","doi":"10.1111/1753-0407.70072","DOIUrl":"https://doi.org/10.1111/1753-0407.70072","url":null,"abstract":"<p>Artificial intelligence (AI) models are increasingly used in clinical practice, including medical education and the dissemination of updated clinical guidelines. In this study, we evaluated four AI tools—ChatGPT-4o, ChatGPT-o1, ChatGPT-o3Mini, and DeepSeek—to assess their ability to summarize the <i>Standards of Care in Diabetes—2025</i> from the American Diabetes Association (ADA) for cardiovascular physicians in primary care settings [<span>1</span>].</p><p>Using a standardized prompt, we compared the AI-generated summaries across 10 key metrics, including accuracy (alignment with ADA 2025 guidelines), completeness (inclusion of core topics such as glycemic targets, blood pressure management, lipid control, and pharmacologic strategies), clarity (readability and conciseness for cardiovascular physicians), clinical relevance (utility for real-world cardiovascular practice), consistency (logical coherence and uniformity in recommendations), evidence support (reference to supporting studies and ADA standards), ethics (neutral and evidence-based recommendations), timeliness (inclusion of the latest ADA updates), actionability (practical guidance for cardiovascular physicians), and fluency (professional language and structure). Each AI tool was rated on a 0–5 scale for each category, yielding a total possible score of 50 points. All summaries were anonymized to remove identifiers. Each model (ChatGPT-4o, ChatGPT-o1, ChatGPT-o3Mini, and DeepSeek) was then tasked with evaluating all four anonymized summaries, including its own output, using the predefined 10 metrics. For each model, the four scores assigned by the evaluators (including self-evaluation) were averaged to calculate the final score per metric.</p><p>Our evaluation showed that ChatGPT-o1 performed best (48.3/50), excelling in completeness (5.0), clinical relevance (5.0), and actionability (5.0), with comprehensive coverage of diabetes screening, cardiovascular risk assessment, hypertension/lipid management, and multidisciplinary collaboration (Table 1). However, its evidence support (4.0) required improvement. ChatGPT-4o (45.5/50) demonstrated strengths in clarity (4.8) and structure but had limitations in timeliness (4.5) and evidence support (3.3), as it failed to incorporate 2025 guideline updates and lacked specific research references. The free models, O3Mini (47.3/50) and DeepSeek (47.3/50), performed comparably to paid tools. O3Mini excelled in consistency (5.0) and CKD/heart failure monitoring, while DeepSeek prioritized concise cardiovascular risk management (clarity: 5.0). Both free models, however, scored lower in completeness (O3Mini: 4.8; DeepSeek: 4.5) and evidence support (O3Mini: 4.0; DeepSeek: 3.8), reflecting insufficient integration of 2025 updates and trial data (Table 1).</p><p>Among the most critical takeaways for cardiovascular physicians were the importance of individualized glycemic targets, the use of SGLT2 inhibitors and GLP-1 receptor agonists for cardiovascu","PeriodicalId":189,"journal":{"name":"Journal of Diabetes","volume":"17 3","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1753-0407.70072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143565103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}