{"title":"利用 LLM 通过检索增强生成 (RAG) 增强膳食补充剂问题解答","authors":"Yu Hou, Rui Zhang","doi":"10.1101/2024.09.11.24313513","DOIUrl":null,"url":null,"abstract":"Objective: To enhance the accuracy and reliability of dietary supplement (DS) question answering by integrating a novel Retrieval-Augmented Generation (RAG) LLM system with an updated and integrated DS knowledge base and providing a user-friendly interface. With.\nMaterials and Methods: We developed iDISK2.0 by integrating updated data from multiple trusted sources, including NMCD, MSKCC, DSLD, and NHPD, and applied advanced integration strategies to reduce noise. We then applied the iDISK2.0 with a RAG system, leveraging the strengths of large language models (LLMs) and a biomedical knowledge graph (BKG) to address the hallucination issues inherent in standalone LLMs. The system enhances answer generation by using LLMs (GPT-4.0) to retrieve contextually relevant subgraphs from the BKG based on identified entities in the query. A user-friendly interface was built to facilitate easy access to DS knowledge through conversational text inputs.\nResults: The iDISK2.0 encompasses 174,317 entities across seven types, six types of relationships, and 471,063 attributes. The iDISK2.0-RAG system significantly improved the accuracy of DS-related information retrieval. Our evaluations showed that the system achieved over 95% accuracy in answering True/False and multiple-choice questions, outperforming standalone LLMs. Additionally, the user-friendly interface enabled efficient interaction, allowing users to input free-form text queries and receive accurate, contextually relevant responses. The integration process minimized data noise and ensured the most up-to-date and comprehensive DS information was available to users.\nConclusion: The integration of iDISK2.0 with an RAG system effectively addresses the limitations of LLMs, providing a robust solution for accurate DS information retrieval. This study underscores the importance of combining structured knowledge graphs with advanced language models to enhance the precision and reliability of information retrieval systems, ultimately supporting better-informed decisions in DS-related research and healthcare.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Dietary Supplement Question Answer via Retrieval-Augmented Generation (RAG) with LLM\",\"authors\":\"Yu Hou, Rui Zhang\",\"doi\":\"10.1101/2024.09.11.24313513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To enhance the accuracy and reliability of dietary supplement (DS) question answering by integrating a novel Retrieval-Augmented Generation (RAG) LLM system with an updated and integrated DS knowledge base and providing a user-friendly interface. With.\\nMaterials and Methods: We developed iDISK2.0 by integrating updated data from multiple trusted sources, including NMCD, MSKCC, DSLD, and NHPD, and applied advanced integration strategies to reduce noise. We then applied the iDISK2.0 with a RAG system, leveraging the strengths of large language models (LLMs) and a biomedical knowledge graph (BKG) to address the hallucination issues inherent in standalone LLMs. The system enhances answer generation by using LLMs (GPT-4.0) to retrieve contextually relevant subgraphs from the BKG based on identified entities in the query. A user-friendly interface was built to facilitate easy access to DS knowledge through conversational text inputs.\\nResults: The iDISK2.0 encompasses 174,317 entities across seven types, six types of relationships, and 471,063 attributes. The iDISK2.0-RAG system significantly improved the accuracy of DS-related information retrieval. Our evaluations showed that the system achieved over 95% accuracy in answering True/False and multiple-choice questions, outperforming standalone LLMs. Additionally, the user-friendly interface enabled efficient interaction, allowing users to input free-form text queries and receive accurate, contextually relevant responses. The integration process minimized data noise and ensured the most up-to-date and comprehensive DS information was available to users.\\nConclusion: The integration of iDISK2.0 with an RAG system effectively addresses the limitations of LLMs, providing a robust solution for accurate DS information retrieval. This study underscores the importance of combining structured knowledge graphs with advanced language models to enhance the precision and reliability of information retrieval systems, ultimately supporting better-informed decisions in DS-related research and healthcare.\",\"PeriodicalId\":501454,\"journal\":{\"name\":\"medRxiv - Health Informatics\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.09.11.24313513\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.11.24313513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing Dietary Supplement Question Answer via Retrieval-Augmented Generation (RAG) with LLM
Objective: To enhance the accuracy and reliability of dietary supplement (DS) question answering by integrating a novel Retrieval-Augmented Generation (RAG) LLM system with an updated and integrated DS knowledge base and providing a user-friendly interface. With.
Materials and Methods: We developed iDISK2.0 by integrating updated data from multiple trusted sources, including NMCD, MSKCC, DSLD, and NHPD, and applied advanced integration strategies to reduce noise. We then applied the iDISK2.0 with a RAG system, leveraging the strengths of large language models (LLMs) and a biomedical knowledge graph (BKG) to address the hallucination issues inherent in standalone LLMs. The system enhances answer generation by using LLMs (GPT-4.0) to retrieve contextually relevant subgraphs from the BKG based on identified entities in the query. A user-friendly interface was built to facilitate easy access to DS knowledge through conversational text inputs.
Results: The iDISK2.0 encompasses 174,317 entities across seven types, six types of relationships, and 471,063 attributes. The iDISK2.0-RAG system significantly improved the accuracy of DS-related information retrieval. Our evaluations showed that the system achieved over 95% accuracy in answering True/False and multiple-choice questions, outperforming standalone LLMs. Additionally, the user-friendly interface enabled efficient interaction, allowing users to input free-form text queries and receive accurate, contextually relevant responses. The integration process minimized data noise and ensured the most up-to-date and comprehensive DS information was available to users.
Conclusion: The integration of iDISK2.0 with an RAG system effectively addresses the limitations of LLMs, providing a robust solution for accurate DS information retrieval. This study underscores the importance of combining structured knowledge graphs with advanced language models to enhance the precision and reliability of information retrieval systems, ultimately supporting better-informed decisions in DS-related research and healthcare.