LmRaC:功能可扩展的 LLM 用户实验结果查询工具。

Douglas B Craig, Sorin Drăghici
{"title":"LmRaC:功能可扩展的 LLM 用户实验结果查询工具。","authors":"Douglas B Craig, Sorin Drăghici","doi":"10.1093/bioinformatics/btae679","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.</p><p><strong>Results: </strong>Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).</p><p><strong>Availability and implementation: </strong>Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LmRaC: a functionally extensible tool for LLM interrogation of user experimental results.\",\"authors\":\"Douglas B Craig, Sorin Drăghici\",\"doi\":\"10.1093/bioinformatics/btae679\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.</p><p><strong>Results: </strong>Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).</p><p><strong>Availability and implementation: </strong>Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae679\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动机大型语言模型(LLMs)在众多领域都取得了令人瞩目的成果。然而,人们对幻觉和伪造权威来源的持续担忧,为其在科学研究中的全面应用提出了严重的问题。检索增强生成(RAG)是一种让 LLM 在推理任务中使用数据和文档的技术,这些数据和文档在训练过程中是不可用的。除了向 LLM 提供动态和定量数据外,RAG 还提供了仔细控制和追踪源材料的方法,从而确保结果的准确性、完整性和权威性:我们在此介绍 LmRaC,这是一种基于 LLM 的工具,能够根据用户自己的实验结果回答复杂的科学问题。LmRaC 允许用户从 PubMed 资源(RAGdom)中动态建立特定领域的知识库。答案完全来自 RAG,引文精确到段落级别,几乎消除了任何幻觉或捏造的可能性。然后,这些答案可用于构建实验上下文(RAGexp),连同用户提供的文档(如设计、协议)和定量结果,可用于回答有关用户特定实验的问题。有关定量实验数据的问题是 LmRaC 不可分割的一部分,由用户定义且功能可扩展的 REST API 服务器(RAGfun)提供支持:有关 LmRaC 的详细文档以及用于定义用户功能的 REST API 服务器示例,请访问 https://github.com/dbcraig/LmRaC。LmRaC 网络应用程序镜像可从 Docker Hub (https://hub.docker.com) 以 dbcraig/lmrac 的形式提取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LmRaC: a functionally extensible tool for LLM interrogation of user experimental results.

Motivation: Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.

Results: Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).

Availability and implementation: Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Phasing Nanopore genome assembly by integrating heterozygous variations and Hi-C data. STRprofiler: efficient comparisons of short tandem repeat profiles for biomedical model authentication. Virtual Tissue Expression Analysis. Fast Polypharmacy Side Effect Prediction Using Tensor Factorisation. Lefser: Implementation of metagenomic biomarker discovery tool, LEfSe, in R.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1