利用大型语言模型自动筛选摘要的问题解答框架。

IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2024-09-01 DOI:10.1093/jamia/ocae166
Opeoluwa Akinseloyin, Xiaorui Jiang, Vasile Palade
{"title":"利用大型语言模型自动筛选摘要的问题解答框架。","authors":"Opeoluwa Akinseloyin, Xiaorui Jiang, Vasile Palade","doi":"10.1093/jamia/ocae166","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This paper aims to address the challenges in abstract screening within systematic reviews (SR) by leveraging the zero-shot capabilities of large language models (LLMs).</p><p><strong>Methods: </strong>We employ LLM to prioritize candidate studies by aligning abstracts with the selection criteria outlined in an SR protocol. Abstract screening was transformed into a novel question-answering (QA) framework, treating each selection criterion as a question addressed by LLM. The framework involves breaking down the selection criteria into multiple questions, properly prompting LLM to answer each question, scoring and re-ranking each answer, and combining the responses to make nuanced inclusion or exclusion decisions.</p><p><strong>Results and discussion: </strong>Large-scale validation was performed on the benchmark of CLEF eHealth 2019 Task 2: Technology-Assisted Reviews in Empirical Medicine. Focusing on GPT-3.5 as a case study, the proposed QA framework consistently exhibited a clear advantage over traditional information retrieval approaches and bespoke BERT-family models that were fine-tuned for prioritizing candidate studies (ie, from the BERT to PubMedBERT) across 31 datasets of 4 categories of SRs, underscoring their high potential in facilitating abstract screening. The experiments also showcased the viability of using selection criteria as a query for reference prioritization. The experiments also showcased the viability of the framework using different LLMs.</p><p><strong>Conclusion: </strong>Investigation justified the indispensable value of leveraging selection criteria to improve the performance of automated abstract screening. LLMs demonstrated proficiency in prioritizing candidate studies for abstract screening using the proposed QA framework. Significant performance improvements were obtained by re-ranking answers using the semantic alignment between abstracts and selection criteria. This further highlighted the pertinence of utilizing selection criteria to enhance abstract screening.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1939-1952"},"PeriodicalIF":4.7000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339526/pdf/","citationCount":"0","resultStr":"{\"title\":\"A question-answering framework for automated abstract screening using large language models.\",\"authors\":\"Opeoluwa Akinseloyin, Xiaorui Jiang, Vasile Palade\",\"doi\":\"10.1093/jamia/ocae166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This paper aims to address the challenges in abstract screening within systematic reviews (SR) by leveraging the zero-shot capabilities of large language models (LLMs).</p><p><strong>Methods: </strong>We employ LLM to prioritize candidate studies by aligning abstracts with the selection criteria outlined in an SR protocol. Abstract screening was transformed into a novel question-answering (QA) framework, treating each selection criterion as a question addressed by LLM. The framework involves breaking down the selection criteria into multiple questions, properly prompting LLM to answer each question, scoring and re-ranking each answer, and combining the responses to make nuanced inclusion or exclusion decisions.</p><p><strong>Results and discussion: </strong>Large-scale validation was performed on the benchmark of CLEF eHealth 2019 Task 2: Technology-Assisted Reviews in Empirical Medicine. Focusing on GPT-3.5 as a case study, the proposed QA framework consistently exhibited a clear advantage over traditional information retrieval approaches and bespoke BERT-family models that were fine-tuned for prioritizing candidate studies (ie, from the BERT to PubMedBERT) across 31 datasets of 4 categories of SRs, underscoring their high potential in facilitating abstract screening. The experiments also showcased the viability of using selection criteria as a query for reference prioritization. The experiments also showcased the viability of the framework using different LLMs.</p><p><strong>Conclusion: </strong>Investigation justified the indispensable value of leveraging selection criteria to improve the performance of automated abstract screening. LLMs demonstrated proficiency in prioritizing candidate studies for abstract screening using the proposed QA framework. Significant performance improvements were obtained by re-ranking answers using the semantic alignment between abstracts and selection criteria. This further highlighted the pertinence of utilizing selection criteria to enhance abstract screening.</p>\",\"PeriodicalId\":50016,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association\",\"volume\":\" \",\"pages\":\"1939-1952\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339526/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocae166\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae166","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

目的本文旨在利用大型语言模型(LLM)的零误差功能,解决系统综述(SR)中摘要筛选的难题:我们利用 LLM 将摘要与系统综述协议中列出的筛选标准进行比对,从而确定候选研究的优先次序。摘要筛选被转化为一个新颖的问题解答(QA)框架,将每个选择标准视为 LLM 所要解决的问题。该框架包括将筛选标准分解为多个问题,适当提示 LLM 回答每个问题,对每个答案进行评分和重新排序,并将回答结合起来,以做出细致入微的纳入或排除决定:在 CLEF eHealth 2019 任务 2:经验医学中的技术辅助综述的基准上进行了大规模验证。以 GPT-3.5 为案例,在 4 类 SR 的 31 个数据集中,与传统的信息检索方法和为确定候选研究的优先级而微调的定制 BERT 系列模型(即从 BERT 到 PubMedBERT)相比,所提出的 QA 框架始终表现出明显的优势,凸显了其在促进摘要筛选方面的巨大潜力。实验还展示了使用选择标准作为参考文献优先排序查询的可行性。实验还展示了该框架在使用不同 LLM 时的可行性:调查证明,利用选择标准来提高自动摘要筛选的性能具有不可或缺的价值。使用所提出的质量保证框架,LLMs 能熟练地为摘要筛选确定候选研究的优先次序。利用摘要与选择标准之间的语义一致性对答案进行重新排序,可显著提高性能。这进一步凸显了利用选择标准加强摘要筛选的针对性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A question-answering framework for automated abstract screening using large language models.

Objective: This paper aims to address the challenges in abstract screening within systematic reviews (SR) by leveraging the zero-shot capabilities of large language models (LLMs).

Methods: We employ LLM to prioritize candidate studies by aligning abstracts with the selection criteria outlined in an SR protocol. Abstract screening was transformed into a novel question-answering (QA) framework, treating each selection criterion as a question addressed by LLM. The framework involves breaking down the selection criteria into multiple questions, properly prompting LLM to answer each question, scoring and re-ranking each answer, and combining the responses to make nuanced inclusion or exclusion decisions.

Results and discussion: Large-scale validation was performed on the benchmark of CLEF eHealth 2019 Task 2: Technology-Assisted Reviews in Empirical Medicine. Focusing on GPT-3.5 as a case study, the proposed QA framework consistently exhibited a clear advantage over traditional information retrieval approaches and bespoke BERT-family models that were fine-tuned for prioritizing candidate studies (ie, from the BERT to PubMedBERT) across 31 datasets of 4 categories of SRs, underscoring their high potential in facilitating abstract screening. The experiments also showcased the viability of using selection criteria as a query for reference prioritization. The experiments also showcased the viability of the framework using different LLMs.

Conclusion: Investigation justified the indispensable value of leveraging selection criteria to improve the performance of automated abstract screening. LLMs demonstrated proficiency in prioritizing candidate studies for abstract screening using the proposed QA framework. Significant performance improvements were obtained by re-ranking answers using the semantic alignment between abstracts and selection criteria. This further highlighted the pertinence of utilizing selection criteria to enhance abstract screening.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
期刊最新文献
Efficacy of the mLab App: a randomized clinical trial for increasing HIV testing uptake using mobile technology. Machine learning-based prediction models in medical decision-making in kidney disease: patient, caregiver, and clinician perspectives on trust and appropriate use. Research for all: building a diverse researcher community for the All of Us Research Program. Learning health system linchpins: information exchange and a common data model. Oncointerpreter.ai enables interactive, personalized summarization of cancer diagnostics data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1