Data extraction for evidence synthesis using a large language model: A proof-of-concept study

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Research Synthesis Methods Pub Date : 2024-03-03 DOI:10.1002/jrsm.1710

Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Ian Thomas, Shannon Kugley, Karen Crotty, Meera Viswanathan, Barbara Nussbaumer-Streit, Graham Booth, Nathaniel Erskine, Amanda Konet, Robert Chew

{"title":"Data extraction for evidence synthesis using a large language model: A proof-of-concept study","authors":"Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Ian Thomas, Shannon Kugley, Karen Crotty, Meera Viswanathan, Barbara Nussbaumer-Streit, Graham Booth, Nathaniel Erskine, Amanda Konet, Robert Chew","doi":"10.1002/jrsm.1710","DOIUrl":null,"url":null,"abstract":"<p>Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test–retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (<i>n</i> = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"576-589"},"PeriodicalIF":5.0000,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1710","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Synthesis Methods","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jrsm.1710","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test–retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (n = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用大型语言模型进行证据综合的数据提取：概念验证研究。

数据提取是证据合成的一个关键部分，但也是劳动密集型和容易出错的部分。迄今为止，利用机器学习提高数据提取过程效率的努力尚未达到足够的准确性和可用性。随着大型语言模型（LLM）的发布，为提高证据合成数据提取的效率和准确性提供了新的可能性。这项概念验证研究的目的是评估 LLM（克劳德 2）与系统综述中使用的人工数据提取相比，在从已发表的研究中提取数据元素方面的性能。我们的分析采用了方便抽样的方式，抽样对象是 10 篇英文公开发表的随机对照试验出版物，这些出版物都包含在一篇系统综述中。我们选取了 16 种不同类型的数据，难度各不相同（10 项研究共包含 160 个数据元素）。我们使用 Claude 2 的浏览器版本上传了每份出版物的可移植文档格式，然后对每个数据元素进行了模型提示。在 160 个数据元素中，Claude 2 的总体准确率为 96.3%，测试-重复测试的可靠性很高（复制 1：96.9%；复制 2：95.0%）。总体而言，克劳德 2 号在 160 个数据项中出现了 6 次错误。最常见的错误（n = 4）是遗漏数据项。重要的是，Claude 2 的易用性很高；它不需要专业技术知识或标注的训练数据就能有效运行（即零射击学习）。根据我们的概念验证研究结果，利用 LLMs 有可能大大提高证据综合数据提取的效率和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Research Synthesis Methods MATHEMATICAL & COMPUTATIONAL BIOLOGYMULTID-MULTIDISCIPLINARY SCIENCES

CiteScore

16.90

自引率

3.10%

发文量

期刊介绍： Research Synthesis Methods is a reputable, peer-reviewed journal that focuses on the development and dissemination of methods for conducting systematic research synthesis. Our aim is to advance the knowledge and application of research synthesis methods across various disciplines. Our journal provides a platform for the exchange of ideas and knowledge related to designing, conducting, analyzing, interpreting, reporting, and applying research synthesis. While research synthesis is commonly practiced in the health and social sciences, our journal also welcomes contributions from other fields to enrich the methodologies employed in research synthesis across scientific disciplines. By bridging different disciplines, we aim to foster collaboration and cross-fertilization of ideas, ultimately enhancing the quality and effectiveness of research synthesis methods. Whether you are a researcher, practitioner, or stakeholder involved in research synthesis, our journal strives to offer valuable insights and practical guidance for your work.