LitSumm: large language models for literature summarization of noncoding RNAs.

IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Database: The Journal of Biological Databases and Curation Pub Date : 2025-02-05 DOI:10.1093/database/baaf006
Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney
{"title":"LitSumm: large language models for literature summarization of noncoding RNAs.","authors":"Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney","doi":"10.1093/database/baaf006","DOIUrl":null,"url":null,"abstract":"<p><p>Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833236/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baaf006","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LitSumm:非编码rna文献综述的大型语言模型。
生命科学文献的管理是一个越来越大的挑战。出版速度的持续增长,加上世界范围内馆长的数量相对固定,对生物医学知识库的开发人员提出了重大挑战。很少有知识库有资源可以扩展到整个相关文献,并且所有知识库都必须优先考虑他们的努力。在这项工作中,我们通过使用大型语言模型(llm)生成非编码RNA的文献摘要,迈出了缓解RNA科学缺乏管理员时间的第一步。我们证明,使用商业法学硕士和一系列提示和检查,可以从文献中自动生成具有准确参考文献的高质量,事实准确的摘要。对摘要的一个子集进行了人工评估,其中大多数被评为极高质量。我们将我们的工具应用于选择的bb104600个ncrna,并通过rnaccentral资源提供生成的摘要。我们得出的结论是,如果采用仔细的提示和自动检查,自动文献摘要在当前一代法学硕士中是可行的。数据库地址:https://rnacentral.org/。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Database: The Journal of Biological Databases and Curation
Database: The Journal of Biological Databases and Curation MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
9.00
自引率
3.40%
发文量
100
审稿时长
>12 weeks
期刊介绍: Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data. Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.
期刊最新文献
GdbMTB: a manually curated genomic database of magnetotactic bacteria. H-SPAR DB: human spaceflight platform for analysis and research-an integrative omics database for space health. ePerturbDB: enhancer's experimental perturbation database. SynVectorDB: embedding-based retrieval system for synthetic biology parts. Tear fluid database: a reference website for tear fluid proteomics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1