Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES JAMIA Open Pub Date : 2025-02-08 eCollection Date: 2025-02-01 DOI:10.1093/jamiaopen/ooaf003
Jeffery L Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, Andrew Bate
{"title":"Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.","authors":"Jeffery L Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, Andrew Bate","doi":"10.1093/jamiaopen/ooaf003","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.</p><p><strong>Materials and methods: </strong>We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.</p><p><strong>Results: </strong>Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.</p><p><strong>Discussion: </strong>The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.</p><p><strong>Conclusion: </strong>This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf003"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806702/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.

Materials and methods: We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.

Results: Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.

Discussion: The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.

Conclusion: This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自动化药物警戒证据生成:使用大型语言模型生成上下文感知的结构化查询语言。
目的:利用商业上下文文档,利用大型语言模型(LLMs)将自然语言查询(NLQs)转换为结构化查询语言(SQL)查询,以提高从药物警戒(PV)数据库中检索信息的准确性。材料和方法:我们在检索增强生成(RAG)框架中使用OpenAI的GPT-4模型,并辅以业务上下文文档,将nlq转换为可执行的SQL查询。每个NLQ随机独立地呈现给LLM,以防止记忆。该研究分3个阶段进行,改变查询复杂性,并评估LLM在有和没有业务上下文文档的情况下的性能。结果:我们的方法显著提高了NLQ-to-SQL的准确性,从仅使用数据库模式的8.3%增加到使用业务上下文文档的78.3%。这种增强在低、中、高复杂性查询中都是一致的,这表明上下文知识在查询生成中的关键作用。讨论:业务上下文文档的集成显著提高了LLM生成准确SQL查询的能力(即,既可执行又返回语义上适当的结果)。当排除高复杂性查询时,性能最高可达到85%,这表明可以进行常规部署。结论:本研究提出了一种采用llm进行安全数据检索和分析的新方法,在查询生成准确性方面取得了显着进步。该方法提供了一个适用于各种数据密集型领域的框架,增强了非技术用户对信息检索的可访问性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
期刊最新文献
Patient perspectives about deployment of artificial intelligence decision support tools in a safety-net healthcare system. Real-time automated billing for tobacco treatment: performance evaluation of the CigStopper machine learning framework. Synergy of diagnosis coding between administrative claims and electronic health records of large patient populations across multiple healthcare organizations. Characterization and comparison of structured and unstructured electronic health record data mapped to MedDRA for post-marketing surveillance. Clinical validation of MyCog Mobile: development of a parsimonious and clinically interpretable prediction model for mild cognitive impairment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1