定制大语言模型对罕见儿科疾病病例报告的诊断准确性

IF 16.4 1区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Accounts of Chemical Research Pub Date : 2024-09-13 DOI:10.1002/ajmg.a.63878

Cameron C. Young, Ellie Enichen, Christian Rivera, Corinne A. Auger, Nathan Grant, Arya Rao, Marc D. Succi

{"title":"定制大语言模型对罕见儿科疾病病例报告的诊断准确性","authors":"Cameron C. Young, Ellie Enichen, Christian Rivera, Corinne A. Auger, Nathan Grant, Arya Rao, Marc D. Succi","doi":"10.1002/ajmg.a.63878","DOIUrl":null,"url":null,"abstract":"Accurately diagnosing rare pediatric diseases frequently represent a clinical challenge due to their complex and unusual clinical presentations. Here, we explore the capabilities of three large language models (LLMs), GPT‐4, Gemini Pro, and a custom‐built LLM (GPT‐4 integrated with the Human Phenotype Ontology [GPT‐4 HPO]), by evaluating their diagnostic performance on 61 rare pediatric disease case reports. The performance of the LLMs were assessed for accuracy in identifying specific diagnoses, listing the correct diagnosis among a differential list, and broad disease categories. In addition, GPT‐4 HPO was tested on 100 general pediatrics case reports previously assessed on other LLMs to further validate its performance. The results indicated that GPT‐4 was able to predict the correct diagnosis with a diagnostic accuracy of 13.1%, whereas both GPT‐4 HPO and Gemini Pro had diagnostic accuracies of 8.2%. Further, GPT‐4 HPO showed an improved performance compared with the other two LLMs in identifying the correct diagnosis among its differential list and the broad disease category. Although these findings underscore the potential of LLMs for diagnostic support, particularly when enhanced with domain‐specific ontologies, they also stress the need for further improvement prior to integration into clinical practice.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports\",\"authors\":\"Cameron C. Young, Ellie Enichen, Christian Rivera, Corinne A. Auger, Nathan Grant, Arya Rao, Marc D. Succi\",\"doi\":\"10.1002/ajmg.a.63878\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurately diagnosing rare pediatric diseases frequently represent a clinical challenge due to their complex and unusual clinical presentations. Here, we explore the capabilities of three large language models (LLMs), GPT‐4, Gemini Pro, and a custom‐built LLM (GPT‐4 integrated with the Human Phenotype Ontology [GPT‐4 HPO]), by evaluating their diagnostic performance on 61 rare pediatric disease case reports. The performance of the LLMs were assessed for accuracy in identifying specific diagnoses, listing the correct diagnosis among a differential list, and broad disease categories. In addition, GPT‐4 HPO was tested on 100 general pediatrics case reports previously assessed on other LLMs to further validate its performance. The results indicated that GPT‐4 was able to predict the correct diagnosis with a diagnostic accuracy of 13.1%, whereas both GPT‐4 HPO and Gemini Pro had diagnostic accuracies of 8.2%. Further, GPT‐4 HPO showed an improved performance compared with the other two LLMs in identifying the correct diagnosis among its differential list and the broad disease category. Although these findings underscore the potential of LLMs for diagnostic support, particularly when enhanced with domain‐specific ontologies, they also stress the need for further improvement prior to integration into clinical practice.\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/ajmg.a.63878\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/ajmg.a.63878","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

由于罕见儿科疾病的临床表现复杂而不寻常，因此准确诊断罕见儿科疾病常常是一项临床挑战。在此，我们通过对 61 例罕见儿科疾病病例报告的诊断性能进行评估，探讨了 GPT-4、Gemini Pro 和定制 LLM（GPT-4 与人类表型本体 [GPT-4 HPO] 集成）这三种大型语言模型（LLM）的能力。对 LLM 的性能进行了评估，包括识别特定诊断的准确性、在鉴别列表中列出正确诊断的准确性以及疾病类别的广泛性。此外，GPT-4 HPO 还在 100 份普通儿科病例报告上进行了测试，这些病例报告之前曾在其他 LLMs 上进行过评估，以进一步验证其性能。结果表明，GPT-4 预测正确诊断的准确率为 13.1%，而 GPT-4 HPO 和 Gemini Pro 的诊断准确率均为 8.2%。此外，与其他两种 LLM 相比，GPT-4 HPO 在确定其鉴别列表和疾病大类中的正确诊断方面表现更佳。尽管这些研究结果强调了 LLMs 在诊断支持方面的潜力，尤其是在使用特定领域本体的情况下，但它们也强调了在整合到临床实践之前进一步改进的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports

Accurately diagnosing rare pediatric diseases frequently represent a clinical challenge due to their complex and unusual clinical presentations. Here, we explore the capabilities of three large language models (LLMs), GPT‐4, Gemini Pro, and a custom‐built LLM (GPT‐4 integrated with the Human Phenotype Ontology [GPT‐4 HPO]), by evaluating their diagnostic performance on 61 rare pediatric disease case reports. The performance of the LLMs were assessed for accuracy in identifying specific diagnoses, listing the correct diagnosis among a differential list, and broad disease categories. In addition, GPT‐4 HPO was tested on 100 general pediatrics case reports previously assessed on other LLMs to further validate its performance. The results indicated that GPT‐4 was able to predict the correct diagnosis with a diagnostic accuracy of 13.1%, whereas both GPT‐4 HPO and Gemini Pro had diagnostic accuracies of 8.2%. Further, GPT‐4 HPO showed an improved performance compared with the other two LLMs in identifying the correct diagnosis among its differential list and the broad disease category. Although these findings underscore the potential of LLMs for diagnostic support, particularly when enhanced with domain‐specific ontologies, they also stress the need for further improvement prior to integration into clinical practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Accounts of Chemical Research 化学-化学综合

CiteScore

31.40

自引率

1.10%

发文量

312

审稿时长

2 months

期刊介绍： Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.

期刊最新文献

Management of Cholesteatoma: Hearing Rehabilitation. Congenital Cholesteatoma. Evaluation of Cholesteatoma. Management of Cholesteatoma: Extension Beyond Middle Ear/Mastoid. Recidivism and Recurrence.