Large Language Models (LLMs) demonstrate considerable potential in enhancing the retrieval of health information. However, the hallucinatory they produce poses a security challenge. This study aimed to improve the accuracy and reliability of LLMs in hypertension education through the integration of integrating Retrieval-Augmented Generation (RAG) technology. We constructed a hypertension supplement knowledge base, and subsequently integrated it into a RAG technology, resulting in the development of the HEART (Hypertension Enhancing Answer Retrieval Tool) framework. A set of 50 commonly asked questions related to hypertension was used to evaluate the performance of four base LLMs-ChatGPT-4o, Claude-3.5, Gemini-2.5, and Llama-3.3-as well as their corresponding HEART-enhanced versions. Clinical experts assessed each response in terms of accuracy, completeness, consistency, robustness, security, and overall quality. The integration with the HEART framework led to a significant improvement in the performance of all four LLMs across five key evaluation dimensions: accuracy, completeness, consistency, security, and robustness (all P < 0.05). The average overall quality scores for all models increased significantly: from 3.57 (SD 0.72) to 4.20 (SD 0.41) for Llama-3.3, from 3.92 (SD 0.70) to 4.38 (SD 0.42) for Claude-3.5, from 3.91 (SD 0.73) to 4.32 (SD 0.39) for ChatGPT-4o, and from 4.03 (SD 0.69) to 4.38 (SD 0.41) for Gemini-2.5 (all P < 0.001). This study highlights the importance of combining high-quality, domain-specific medical data with advanced artificial intelligence techniques to enhance accuracy and reduce misinformation in healthcare applications.
扫码关注我们
求助内容:
应助结果提醒方式:
