行动中的开源大型语言模型:用于 PRIDE 数据库的生物信息学聊天机器人。

IF 3.4 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Proteomics Pub Date : 2024-11-01 Epub Date: 2024-03-31 DOI:10.1002/pmic.202400005
Jingwen Bai, Selvakumar Kamatchinathan, Deepti J Kundu, Chakradhar Bandla, Juan Antonio Vizcaíno, Yasset Perez-Riverol
{"title":"行动中的开源大型语言模型:用于 PRIDE 数据库的生物信息学聊天机器人。","authors":"Jingwen Bai, Selvakumar Kamatchinathan, Deepti J Kundu, Chakradhar Bandla, Juan Antonio Vizcaíno, Yasset Perez-Riverol","doi":"10.1002/pmic.202400005","DOIUrl":null,"url":null,"abstract":"<p><p>We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https://github.com/PRIDE-Archive/pride-chatbot).</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":" ","pages":"e2400005"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Open-source large language models in action: A bioinformatics chatbot for PRIDE database.\",\"authors\":\"Jingwen Bai, Selvakumar Kamatchinathan, Deepti J Kundu, Chakradhar Bandla, Juan Antonio Vizcaíno, Yasset Perez-Riverol\",\"doi\":\"10.1002/pmic.202400005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https://github.com/PRIDE-Archive/pride-chatbot).</p>\",\"PeriodicalId\":224,\"journal\":{\"name\":\"Proteomics\",\"volume\":\" \",\"pages\":\"e2400005\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/pmic.202400005\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/31 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pmic.202400005","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/31 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

我们在此介绍一种聊天机器人助手基础架构 (https://www.ebi.ac.uk/pride/chatbot/),它能简化用户与 PRIDE 数据库的文档和数据集搜索功能的交互。该框架采用了多种大型语言模型(LLM):llama2、chatglm、mixtral(mistral)和 openhermes。它还包括一个网络服务 API(应用编程接口)、网络接口以及用于索引和管理矢量数据库的组件。该框架还包括一个基于 Elo 排名系统的基准组件,用于评估每个 LLM 的性能和改进 PRIDE 文档。聊天机器人不仅可以让用户与 PRIDE 文档互动,还可以使用基于 LLM 的推荐系统搜索和查找 PRIDE 数据集,从而实现数据集的可发现性。重要的是,虽然我们的基础架构是通过在 PRIDE 数据库中的应用来体现的,但我们的方法具有模块化和适应性强的特点,这使其成为一种有价值的工具,可用于改善生物信息学和蛋白质组学工具和资源等领域的用户体验。先进的 LLMs、创新的基于向量的构建、基准测试框架和优化的文档整合在一起,形成了一个强大且可移植的聊天机器人助手基础架构。该框架是开源的(https://github.com/PRIDE-Archive/pride-chatbot)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Open-source large language models in action: A bioinformatics chatbot for PRIDE database.

We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https://github.com/PRIDE-Archive/pride-chatbot).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Proteomics
Proteomics 生物-生化研究方法
CiteScore
6.30
自引率
5.90%
发文量
193
审稿时长
3 months
期刊介绍: PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.
期刊最新文献
Special Issue on "Metaproteomics and meta-omics perspectives to decrypt Microbiome Functionality". In-Depth Proteome Profiling of the Hippocampus of LDLR Knockout Mice Reveals Alternation in Synaptic Signaling Pathway. Parallel Analyses by Mass Spectrometry (MS) and Reverse Phase Protein Array (RPPA) Reveal Complementary Proteomic Profiles in Triple-Negative Breast Cancer (TNBC) Patient Tissues and Cell Cultures. Review and Practical Guide for Getting Started With Single-Cell Proteomics. Omics Studies in CKD: Diagnostic Opportunities and Therapeutic Potential.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1