使用聊天生成预训练变换器作为大语言模型，根据眼压高治疗研究数据集诊断青光眼

IF 4.6 Q1 OPHTHALMOLOGY Ophthalmology science Pub Date : 2025-01-01 Epub Date: 2024-08-22 DOI:10.1016/j.xops.2024.100599

Hina Raja PhD , Xiaoqin Huang PhD , Mohammad Delsoz MD , Yeganeh Madadi PhD , Asma Poursoroush PhD , Asim Munawar PhD , Malik Y. Kahook MD , Siamak Yousefi PhD

{"title":"使用聊天生成预训练变换器作为大语言模型，根据眼压高治疗研究数据集诊断青光眼","authors":"Hina Raja PhD , Xiaoqin Huang PhD , Mohammad Delsoz MD , Yeganeh Madadi PhD , Asma Poursoroush PhD , Asim Munawar PhD , Malik Y. Kahook MD , Siamak Yousefi PhD","doi":"10.1016/j.xops.2024.100599","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>To evaluate the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT), as a large language model (LLM), for diagnosing glaucoma using the Ocular Hypertension Treatment Study (OHTS) dataset, and comparing the diagnostic capability of ChatGPT 3.5 and ChatGPT 4.0.</p></div><div><h3>Design</h3><p>Prospective data collection study.</p></div><div><h3>Participants</h3><p>A total of 3170 eyes of 1585 subjects from the OHTS were included in this study.</p></div><div><h3>Methods</h3><p>We selected demographic, clinical, ocular, visual field, optic nerve head photo, and history of disease parameters of each participant and developed case reports by converting tabular data into textual format based on information from both eyes of all subjects. We then developed a procedure using the application programming interface of ChatGPT, a LLM-based chatbot, to automatically input prompts into a chat box. This was followed by querying 2 different generations of ChatGPT (versions 3.5 and 4.0) regarding the underlying diagnosis of each subject. We then evaluated the output responses based on several objective metrics.</p></div><div><h3>Main Outcome Measures</h3><p>Area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, and F1 score.</p></div><div><h3>Results</h3><p>Chat Generative Pre-Trained Transformer 3.5 achieved AUC of 0.74, accuracy of 66%, specificity of 64%, sensitivity of 85%, and F1 score of 0.72. Chat Generative Pre-Trained Transformer 4.0 obtained AUC of 0.76, accuracy of 87%, specificity of 90%, sensitivity of 61%, and F1 score of 0.92.</p></div><div><h3>Conclusions</h3><p>The accuracy of ChatGPT 4.0 in diagnosing glaucoma based on input data from OHTS was promising. The overall accuracy of ChatGPT 4.0 was higher than ChatGPT 3.5. However, ChatGPT 3.5 was found to be more sensitive than ChatGPT 4.0. In its current forms, ChatGPT may serve as a useful tool in exploring disease status of ocular hypertensive eyes when specific data are available for analysis. In the future, leveraging LLMs with multimodal capabilities, allowing for integration of imaging and diagnostic testing as part of the analyses, could further enhance diagnostic capabilities and enhance diagnostic accuracy.</p></div><div><h3>Financial Disclosures</h3><p>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</p></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 1","pages":"Article 100599"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666914524001350/pdfft?md5=9446852d5e50ba948a58b4ce06421174&pid=1-s2.0-S2666914524001350-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Diagnosing Glaucoma Based on the Ocular Hypertension Treatment Study Dataset Using Chat Generative Pre-Trained Transformer as a Large Language Model\",\"authors\":\"Hina Raja PhD , Xiaoqin Huang PhD , Mohammad Delsoz MD , Yeganeh Madadi PhD , Asma Poursoroush PhD , Asim Munawar PhD , Malik Y. Kahook MD , Siamak Yousefi PhD\",\"doi\":\"10.1016/j.xops.2024.100599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><p>To evaluate the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT), as a large language model (LLM), for diagnosing glaucoma using the Ocular Hypertension Treatment Study (OHTS) dataset, and comparing the diagnostic capability of ChatGPT 3.5 and ChatGPT 4.0.</p></div><div><h3>Design</h3><p>Prospective data collection study.</p></div><div><h3>Participants</h3><p>A total of 3170 eyes of 1585 subjects from the OHTS were included in this study.</p></div><div><h3>Methods</h3><p>We selected demographic, clinical, ocular, visual field, optic nerve head photo, and history of disease parameters of each participant and developed case reports by converting tabular data into textual format based on information from both eyes of all subjects. We then developed a procedure using the application programming interface of ChatGPT, a LLM-based chatbot, to automatically input prompts into a chat box. This was followed by querying 2 different generations of ChatGPT (versions 3.5 and 4.0) regarding the underlying diagnosis of each subject. We then evaluated the output responses based on several objective metrics.</p></div><div><h3>Main Outcome Measures</h3><p>Area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, and F1 score.</p></div><div><h3>Results</h3><p>Chat Generative Pre-Trained Transformer 3.5 achieved AUC of 0.74, accuracy of 66%, specificity of 64%, sensitivity of 85%, and F1 score of 0.72. Chat Generative Pre-Trained Transformer 4.0 obtained AUC of 0.76, accuracy of 87%, specificity of 90%, sensitivity of 61%, and F1 score of 0.92.</p></div><div><h3>Conclusions</h3><p>The accuracy of ChatGPT 4.0 in diagnosing glaucoma based on input data from OHTS was promising. The overall accuracy of ChatGPT 4.0 was higher than ChatGPT 3.5. However, ChatGPT 3.5 was found to be more sensitive than ChatGPT 4.0. In its current forms, ChatGPT may serve as a useful tool in exploring disease status of ocular hypertensive eyes when specific data are available for analysis. In the future, leveraging LLMs with multimodal capabilities, allowing for integration of imaging and diagnostic testing as part of the analyses, could further enhance diagnostic capabilities and enhance diagnostic accuracy.</p></div><div><h3>Financial Disclosures</h3><p>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</p></div>\",\"PeriodicalId\":74363,\"journal\":{\"name\":\"Ophthalmology science\",\"volume\":\"5 1\",\"pages\":\"Article 100599\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666914524001350/pdfft?md5=9446852d5e50ba948a58b4ce06421174&pid=1-s2.0-S2666914524001350-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ophthalmology science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666914524001350\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914524001350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的利用眼压治疗研究（OHTS）数据集评估作为大型语言模型（LLM）的聊天生成预训练变换器（ChatGPT）诊断青光眼的能力，并比较 ChatGPT 3.5 和 ChatGPT 4.0 的诊断能力。方法我们选取了每位受试者的人口统计学、临床、眼底、视野、视神经头照片和病史等参数，并根据所有受试者双眼的信息将表格数据转换成文本格式，编写了病例报告。然后，我们使用基于 LLM 的聊天机器人 ChatGPT 的应用程序接口开发了一个程序，将提示信息自动输入聊天框。随后，我们查询了两代不同的 ChatGPT（3.5 版和 4.0 版），以了解每个受试者的基本诊断情况。结果聊天生成预训练转换器 3.5 的 AUC 为 0.74，准确率为 66%，特异性为 64%，灵敏度为 85%，F1 得分为 0.72。结论基于 OHTS 输入数据的 ChatGPT 4.0 诊断青光眼的准确率很高。ChatGPT 4.0 的总体准确率高于 ChatGPT 3.5。不过，ChatGPT 3.5 的灵敏度要高于 ChatGPT 4.0。目前的 ChatGPT 可以作为一种有用的工具，在有具体数据可供分析的情况下，用于探索眼底高血压眼的疾病状态。未来，利用具有多模态功能的 LLM，将成像和诊断测试整合为分析的一部分，可以进一步增强诊断能力，提高诊断准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Diagnosing Glaucoma Based on the Ocular Hypertension Treatment Study Dataset Using Chat Generative Pre-Trained Transformer as a Large Language Model

Purpose

To evaluate the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT), as a large language model (LLM), for diagnosing glaucoma using the Ocular Hypertension Treatment Study (OHTS) dataset, and comparing the diagnostic capability of ChatGPT 3.5 and ChatGPT 4.0.

Design

Prospective data collection study.

Participants

A total of 3170 eyes of 1585 subjects from the OHTS were included in this study.

Methods

We selected demographic, clinical, ocular, visual field, optic nerve head photo, and history of disease parameters of each participant and developed case reports by converting tabular data into textual format based on information from both eyes of all subjects. We then developed a procedure using the application programming interface of ChatGPT, a LLM-based chatbot, to automatically input prompts into a chat box. This was followed by querying 2 different generations of ChatGPT (versions 3.5 and 4.0) regarding the underlying diagnosis of each subject. We then evaluated the output responses based on several objective metrics.

Main Outcome Measures

Area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, and F1 score.

Results

Chat Generative Pre-Trained Transformer 3.5 achieved AUC of 0.74, accuracy of 66%, specificity of 64%, sensitivity of 85%, and F1 score of 0.72. Chat Generative Pre-Trained Transformer 4.0 obtained AUC of 0.76, accuracy of 87%, specificity of 90%, sensitivity of 61%, and F1 score of 0.92.

Conclusions

The accuracy of ChatGPT 4.0 in diagnosing glaucoma based on input data from OHTS was promising. The overall accuracy of ChatGPT 4.0 was higher than ChatGPT 3.5. However, ChatGPT 3.5 was found to be more sensitive than ChatGPT 4.0. In its current forms, ChatGPT may serve as a useful tool in exploring disease status of ocular hypertensive eyes when specific data are available for analysis. In the future, leveraging LLMs with multimodal capabilities, allowing for integration of imaging and diagnostic testing as part of the analyses, could further enhance diagnostic capabilities and enhance diagnostic accuracy.