Performance of two large language models for data extraction in evidence synthesis

IF 5 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Research Synthesis Methods Pub Date : 2024-06-19 DOI:10.1002/jrsm.1732
Amanda Konet, Ian Thomas, Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Shannon Kugley, Karen Crotty, Meera Viswanathan, Robert Chew
{"title":"Performance of two large language models for data extraction in evidence synthesis","authors":"Amanda Konet,&nbsp;Ian Thomas,&nbsp;Gerald Gartlehner,&nbsp;Leila Kahwati,&nbsp;Rainer Hilscher,&nbsp;Shannon Kugley,&nbsp;Karen Crotty,&nbsp;Meera Viswanathan,&nbsp;Robert Chew","doi":"10.1002/jrsm.1732","DOIUrl":null,"url":null,"abstract":"<p>Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 5","pages":"818-824"},"PeriodicalIF":5.0000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Synthesis Methods","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jrsm.1732","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
两种大型语言模型在证据合成中提取数据的性能。
准确的数据提取是证据合成的关键组成部分,也是获得有效结果的关键。可公开获取的大型语言模型(LLM)的出现引起了人们对这些证据综合工具的兴趣,同时也为选择 LLM 带来了不确定性。我们比较了两种广泛使用的 LLM(Claude 2 和 GPT-4)在从先前完成的系统综述中收录的 10 篇已发表文章中提取预先指定的数据元素时的性能。我们使用提示和完整的研究 PDF 来比较 Claude 2 和 GPT-4 浏览器版本的输出结果。GPT-4 需要使用第三方插件来上传和解析 PDF。Claude 2 的准确率很高(96.3%)。使用插件的 GPT-4 的准确率较低(68.8%);不过,大部分错误是由插件造成的。两种 LLM 都能正确识别源 PDF 中缺少预先指定的数据元素,并为文章中未明确报告的数据元素生成正确的信息。二次分析表明,当提供 PDF 中的选定文本时,Claude 2 和 GPT-4 分别准确提取了 98.7% 和 100% 的数据元素。局限性包括:使用的研究 PDF 范围较窄;仅使用 Claude 2 完成了提示开发;我们无法保证开源文章未被用于训练 LLM。这项研究凸显了 LLM 在数据提取方面的革命性潜力,但同时也强调了精确 PDF 解析的重要性。目前,人类研究人员仍然有必要对 LLM 提取进行验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Research Synthesis Methods
Research Synthesis Methods MATHEMATICAL & COMPUTATIONAL BIOLOGYMULTID-MULTIDISCIPLINARY SCIENCES
CiteScore
16.90
自引率
3.10%
发文量
75
期刊介绍: Research Synthesis Methods is a reputable, peer-reviewed journal that focuses on the development and dissemination of methods for conducting systematic research synthesis. Our aim is to advance the knowledge and application of research synthesis methods across various disciplines. Our journal provides a platform for the exchange of ideas and knowledge related to designing, conducting, analyzing, interpreting, reporting, and applying research synthesis. While research synthesis is commonly practiced in the health and social sciences, our journal also welcomes contributions from other fields to enrich the methodologies employed in research synthesis across scientific disciplines. By bridging different disciplines, we aim to foster collaboration and cross-fertilization of ideas, ultimately enhancing the quality and effectiveness of research synthesis methods. Whether you are a researcher, practitioner, or stakeholder involved in research synthesis, our journal strives to offer valuable insights and practical guidance for your work.
期刊最新文献
Issue Information A tutorial on aggregating evidence from conceptual replication studies using the product Bayes factor Evolving use of the Cochrane Risk of Bias 2 tool in biomedical systematic reviews Exploring methodological approaches used in network meta-analysis of psychological interventions: A scoping review An evaluation of the performance of stopping rules in AI-aided screening for psychological meta-analytical research
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1