{"title":"提高大型语言模型在糖尿病教育中的性能:检索-增强生成法","authors":"Dingqiao Wang, Jiangbo Liang, Jinguo Ye, Jingni Li, Jingpeng Li, Qikai Zhang, Qiuling Hu, Caineng Pan, Dongliang Wang, Zhong Liu, Wen Shi, Danli Shi, Fei Li, Bo Qu, Yingfeng Zheng","doi":"10.2196/58041","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmented Information System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries.</p><p><strong>Objective: </strong>This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.</p><p><strong>Methods: </strong>The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval, summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracy and comprehensiveness and by patients for understandability.</p><p><strong>Results: </strong>The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 base LLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurate responses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by 0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.</p><p><strong>Conclusions: </strong>The RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":" ","pages":"e58041"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancement of the Performance of Large Language Models in Diabetes Education through Retrieval-Augmented Generation: Comparative Study.\",\"authors\":\"Dingqiao Wang, Jiangbo Liang, Jinguo Ye, Jingni Li, Jingpeng Li, Qikai Zhang, Qiuling Hu, Caineng Pan, Dongliang Wang, Zhong Liu, Wen Shi, Danli Shi, Fei Li, Bo Qu, Yingfeng Zheng\",\"doi\":\"10.2196/58041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmented Information System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries.</p><p><strong>Objective: </strong>This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.</p><p><strong>Methods: </strong>The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval, summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracy and comprehensiveness and by patients for understandability.</p><p><strong>Results: </strong>The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 base LLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurate responses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by 0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.</p><p><strong>Conclusions: </strong>The RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.</p>\",\"PeriodicalId\":16337,\"journal\":{\"name\":\"Journal of Medical Internet Research\",\"volume\":\" \",\"pages\":\"e58041\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2024-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Internet Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/58041\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/58041","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Enhancement of the Performance of Large Language Models in Diabetes Education through Retrieval-Augmented Generation: Comparative Study.
Background: Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmented Information System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries.
Objective: This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.
Methods: The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval, summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracy and comprehensiveness and by patients for understandability.
Results: The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 base LLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurate responses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by 0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.
Conclusions: The RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.