Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 "omni" and Gemini 1.5 Flash.

IF 2.6 3区医学 Q1 PEDIATRICS Pediatric Nephrology Pub Date : 2025-01-01 Epub Date: 2024-08-16 DOI:10.1007/s00467-024-06486-3

Gianluca Mondillo, Vittoria Frattolillo, Simone Colosimo, Alessandra Perrotta, Anna Di Sessa, Stefano Guarino, Emanuele Miraglia Del Giudice, Pierluigi Marzuillo

{"title":"Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 \"omni\" and Gemini 1.5 Flash.","authors":"Gianluca Mondillo, Vittoria Frattolillo, Simone Colosimo, Alessandra Perrotta, Anna Di Sessa, Stefano Guarino, Emanuele Miraglia Del Giudice, Pierluigi Marzuillo","doi":"10.1007/s00467-024-06486-3","DOIUrl":null,"url":null,"abstract":"Background: We aimed to evaluate the baseline performance and improvement of ChatGPT-4 \"omni\" (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.Methods: Using questions from the \"Educational Review\" articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.Results: Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (p < 0.001). After training with PDF files, ChatGPT-4o's accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (p < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (p < 0.001).Conclusions: The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5's superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.","PeriodicalId":19735,"journal":{"name":"Pediatric Nephrology","volume":" ","pages":"151-157"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11584465/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Nephrology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00467-024-06486-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: We aimed to evaluate the baseline performance and improvement of ChatGPT-4 "omni" (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.

Methods: Using questions from the "Educational Review" articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.

Results: Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (p < 0.001). After training with PDF files, ChatGPT-4o's accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (p < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (p < 0.001).

Conclusions: The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5's superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

掌握小儿肾脏病学领域的基础知识，并通过 ChatGPT-4 "omni "和 Gemini 1.5 Flash 的专门培训加以提高。

背景：我们的目的是评估 ChatGPT-4 "omni"（ChatGPT-4o）和 Gemini 1.5 Flash（Gemini 1.5）在回答与小儿肾脏病学相关的多项选择题时的基线性能以及经过特定培训后的改进情况：使用2014年1月至2024年4月期间《小儿肾脏病学》杂志发表的 "教育评论 "文章中的问题，使用Python脚本，以教育评论文章的便携式数据格式（PDF）和文本（TXT）文件格式（删除包含正确答案的最后一页），在特定培训前后对模型进行了测试。记录了正确答案的数量：培训前，ChatGPT-4o 正确回答了 1395 个问题中的 75.2%，优于双子座 1.5，后者的正确回答率为 64.9%（p 结论：ChatGPT-4o 的正确回答率为 75.2%，优于双子座 1.5，后者的正确回答率为 64.9%）：这项研究表明，虽然 ChatGPT-4o 具有很强的基线性能，但特定培训并不能显著提高其准确性。相反，尽管 Gemini 1.5 的初始性能较低，但经过培训后，它的性能有了大幅提高，尤其是在 TXT 文件方面。这些研究结果表明，Gemini 1.5 具有卓越的信息存储和检索能力，这使其在临床应用中可能更加有效，尽管要达到最佳性能还需要依赖额外的数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pediatric Nephrology 医学-泌尿学与肾脏学

CiteScore

4.70

自引率

20.00%

发文量

465

审稿时长

1 months

期刊介绍： International Pediatric Nephrology Association Pediatric Nephrology publishes original clinical research related to acute and chronic diseases that affect renal function, blood pressure, and fluid and electrolyte disorders in children. Studies may involve medical, surgical, nutritional, physiologic, biochemical, genetic, pathologic or immunologic aspects of disease, imaging techniques or consequences of acute or chronic kidney disease. There are 12 issues per year that contain Editorial Commentaries, Reviews, Educational Reviews, Original Articles, Brief Reports, Rapid Communications, Clinical Quizzes, and Letters to the Editors.