首页 > 最新文献

Research Synthesis Methods最新文献

英文 中文
StudyTypeTeller-Large language models to automatically classify research study types for systematic reviews. studytypeteller—大型语言模型,用于自动对研究研究类型进行分类,以便进行系统综述。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-09-11 DOI: 10.1017/rsm.2025.10031
Simona Emilova Doneva, Shirin de Viragh, Hanna Hubarava, Stefan Schandelmaier, Matthias Briel, Benjamin Victor Ineichen

screening, a labor-intensive aspect of systematic review, is increasingly challenging due to the rising volume of scientific publications. Recent advances suggest that generative large language models like generative pre-trained transformer (GPT) could aid this process by classifying references into study types such as randomized-controlled trials (RCTs) or animal studies prior to abstract screening. However, it is unknown how these GPT models perform in classifying such scientific study types in the biomedical field. Additionally, their performance has not been directly compared with earlier transformer-based models like bidirectional encoder representations from transformers (BERT). To address this, we developed a human-annotated corpus of 2,645 PubMed titles and abstracts, annotated for 14 study types, including different types of RCTs and animal studies, systematic reviews, study protocols, case reports, as well as in vitro studies. Using this corpus, we compared the performance of GPT-3.5 and GPT-4 in automatically classifying these study types against established BERT models. Our results show that fine-tuned pretrained BERT models consistently outperformed GPT models, achieving F1-scores above 0.8, compared to approximately 0.6 for GPT models. Advanced prompting strategies did not substantially boost GPT performance. In conclusion, these findings highlight that, even though GPT models benefit from advanced capabilities and extensive training data, their performance in niche tasks like scientific multi-class study classification is inferior to smaller fine-tuned models. Nevertheless, the use of automated methods remains promising for reducing the volume of records, making the screening of large reference libraries more feasible. Our corpus is openly available and can be used to harness other natural language processing (NLP) approaches.

筛选是系统评价的一个劳动密集型方面,由于科学出版物数量的增加,它越来越具有挑战性。最近的进展表明,生成式预训练转换器(GPT)等生成式大型语言模型可以通过在抽象筛选之前将参考文献分类为随机对照试验(rct)或动物研究等研究类型来帮助这一过程。然而,目前尚不清楚这些GPT模型如何在生物医学领域对此类科学研究类型进行分类。此外,它们的性能并没有直接与早期基于变压器的模型进行比较,比如来自变压器的双向编码器表示(BERT)。为了解决这个问题,我们开发了一个人类注释的库,包含2,645篇PubMed标题和摘要,注释了14种研究类型,包括不同类型的随机对照试验和动物研究、系统综述、研究方案、病例报告以及体外研究。使用该语料库,我们比较了GPT-3.5和GPT-4在根据已建立的BERT模型自动分类这些研究类型方面的性能。我们的研究结果表明,经过微调的预训练BERT模型始终优于GPT模型,其f1得分高于0.8,而GPT模型的f1得分约为0.6。先进的提示策略并没有显著提高GPT性能。总之,这些发现强调,尽管GPT模型受益于先进的能力和广泛的训练数据,但它们在科学多类研究分类等利基任务中的表现不如较小的微调模型。尽管如此,使用自动化方法仍然有望减少记录的数量,使筛选大型参考图书馆更加可行。我们的语料库是公开可用的,可用于利用其他自然语言处理(NLP)方法。
{"title":"StudyTypeTeller-Large language models to automatically classify research study types for systematic reviews.","authors":"Simona Emilova Doneva, Shirin de Viragh, Hanna Hubarava, Stefan Schandelmaier, Matthias Briel, Benjamin Victor Ineichen","doi":"10.1017/rsm.2025.10031","DOIUrl":"10.1017/rsm.2025.10031","url":null,"abstract":"<p><p>screening, a labor-intensive aspect of systematic review, is increasingly challenging due to the rising volume of scientific publications. Recent advances suggest that generative large language models like generative pre-trained transformer (GPT) could aid this process by classifying references into study types such as randomized-controlled trials (RCTs) or animal studies prior to abstract screening. However, it is unknown how these GPT models perform in classifying such scientific study types in the biomedical field. Additionally, their performance has not been directly compared with earlier transformer-based models like bidirectional encoder representations from transformers (BERT). To address this, we developed a human-annotated corpus of 2,645 PubMed titles and abstracts, annotated for 14 study types, including different types of RCTs and animal studies, systematic reviews, study protocols, case reports, as well as in vitro studies. Using this corpus, we compared the performance of GPT-3.5 and GPT-4 in automatically classifying these study types against established BERT models. Our results show that fine-tuned pretrained BERT models consistently outperformed GPT models, achieving F1-scores above 0.8, compared to approximately 0.6 for GPT models. Advanced prompting strategies did not substantially boost GPT performance. In conclusion, these findings highlight that, even though GPT models benefit from advanced capabilities and extensive training data, their performance in niche tasks like scientific multi-class study classification is inferior to smaller fine-tuned models. Nevertheless, the use of automated methods remains promising for reducing the volume of records, making the screening of large reference libraries more feasible. Our corpus is openly available and can be used to harness other natural language processing (NLP) approaches.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"1005-1024"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal large language models to screen citations for systematic reviews. 筛选系统评论引用的最佳大型语言模型。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-06-23 DOI: 10.1017/rsm.2025.10014
Takehiko Oami, Yohei Okada, Taka-Aki Nakada

Recent studies highlight the potential of large language models (LLMs) in citation screening for systematic reviews; however, the efficiency of individual LLMs for this application remains unclear. This study aimed to compare accuracy, time-related efficiency, cost, and consistency across four LLMs-GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.3 70B-for literature screening tasks. The models screened for clinical questions from the Japanese Clinical Practice Guidelines for the Management of Sepsis and Septic Shock 2024. Sensitivity and specificity were calculated for each model based on conventional citation screening results for qualitative assessment. We also recorded the time and cost of screening and assessed consistency to verify reproducibility. A post hoc analysis explored whether integrating outputs from multiple models could enhance screening accuracy. GPT-4o and Llama 3.3 70B achieved high specificity but lower sensitivity, while Gemini 1.5 Pro and Claude 3.5 Sonnet exhibited higher sensitivity at the cost of lower specificity. Citation screening times and costs varied, with GPT-4o being the fastest and Llama 3.3 70B the most cost-effective. Consistency was comparable among the models. An ensemble approach combining model outputs improved sensitivity but increased the number of false positives, requiring additional review effort. Each model demonstrated distinct strengths, effectively streamlining citation screening by saving time and reducing workload. However, reviewing false positives remains a challenge. Combining models may enhance sensitivity, indicating the potential of LLMs to optimize systematic review workflows.

最近的研究强调了大型语言模型(LLMs)在系统综述引文筛选中的潜力;然而,对于这种应用,单个llm的效率仍然不清楚。本研究旨在比较四种llms - gpt - 40、Gemini 1.5 Pro、Claude 3.5 Sonnet和Llama 3.3 70b在文献筛选任务中的准确性、时间相关效率、成本和一致性。这些模型筛选了日本2024年脓毒症和感染性休克管理临床实践指南中的临床问题。根据常规引文筛选结果计算每个模型的敏感性和特异性,进行定性评估。我们还记录了筛选的时间和成本,并评估一致性以验证可重复性。一项事后分析探讨了整合多个模型的输出是否可以提高筛选的准确性。gpt - 40和Llama 3.3 70B具有高特异性但灵敏度较低,而Gemini 1.5 Pro和Claude 3.5 Sonnet具有较高的灵敏度,但特异性较低。引文筛选时间和成本各不相同,gpt - 40是最快的,Llama 3.3 70B是最具成本效益的。各模型之间的一致性具有可比性。结合模型输出的集成方法提高了灵敏度,但增加了假阳性的数量,需要额外的审查工作。每个模型都展示了各自的优势,通过节省时间和减少工作量有效地简化了引文筛选。然而,审查假阳性仍然是一个挑战。结合模型可能会提高灵敏度,这表明llm在优化系统评审工作流程方面具有潜力。
{"title":"Optimal large language models to screen citations for systematic reviews.","authors":"Takehiko Oami, Yohei Okada, Taka-Aki Nakada","doi":"10.1017/rsm.2025.10014","DOIUrl":"10.1017/rsm.2025.10014","url":null,"abstract":"<p><p>Recent studies highlight the potential of large language models (LLMs) in citation screening for systematic reviews; however, the efficiency of individual LLMs for this application remains unclear. This study aimed to compare accuracy, time-related efficiency, cost, and consistency across four LLMs-GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.3 70B-for literature screening tasks. The models screened for clinical questions from the Japanese Clinical Practice Guidelines for the Management of Sepsis and Septic Shock 2024. Sensitivity and specificity were calculated for each model based on conventional citation screening results for qualitative assessment. We also recorded the time and cost of screening and assessed consistency to verify reproducibility. A <i>post hoc</i> analysis explored whether integrating outputs from multiple models could enhance screening accuracy. GPT-4o and Llama 3.3 70B achieved high specificity but lower sensitivity, while Gemini 1.5 Pro and Claude 3.5 Sonnet exhibited higher sensitivity at the cost of lower specificity. Citation screening times and costs varied, with GPT-4o being the fastest and Llama 3.3 70B the most cost-effective. Consistency was comparable among the models. An ensemble approach combining model outputs improved sensitivity but increased the number of false positives, requiring additional review effort. Each model demonstrated distinct strengths, effectively streamlining citation screening by saving time and reducing workload. However, reviewing false positives remains a challenge. Combining models may enhance sensitivity, indicating the potential of LLMs to optimize systematic review workflows.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"859-875"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657656/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NMAsurv: An R Shiny application for network meta-analysis based on survival data. NMAsurv:一个基于生存数据的网络元分析的R Shiny应用程序。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-07-10 DOI: 10.1017/rsm.2025.10020
Taihang Shao, Mingye Zhao, Fenghao Shi, Mingjun Rui, Wenxi Tang

Network meta-analysis (NMA) is becoming increasingly important, especially in the field of medicine, as it allows for comparisons across multiple trials with different interventions. For time-to-event data, that is, survival data, traditional NMA based on the proportional hazards (PH) assumption simply synthesizes reported hazard ratios (HRs). Novel methods for NMA based on the non-PH assumption have been proposed and implemented using R software. However, these methods often involve complex methodologies and require advanced programming skills, creating a barrier for many researchers. Therefore, we developed an R Shiny tool, NMAsurv (https://psurvivala.shinyapps.io/NMAsurv/). NMAsurv allows users with little or zero background in R to conduct survival-data-based NMA effortlessly. The tool supports various functions such as drawing network plots, testing the PH assumption, and building NMA models. Users can input either reconstructed pseudo-individual participant data or aggregated data. NMAsurv offers a user-friendly interface for extracting parameter estimations from various NMA models, including fractional polynomial, piecewise exponential models, parametric survival models, Cox PH model, and generalized gamma model. Additionally, it enables users to effortlessly create survival and HR plots. All operations can be performed by an intuitive "point-and-click" interface. In this study, we introduce all the functionalities and features of NMAsurv and demonstrate its application using a real-world NMA example.

网络荟萃分析(NMA)正变得越来越重要,特别是在医学领域,因为它允许在不同干预措施的多个试验之间进行比较。对于事件时间数据,即生存数据,传统的基于比例风险(PH)假设的NMA简单地综合了报告的风险比(hr)。提出了基于非ph假设的NMA新方法,并利用R软件实现了该方法。然而,这些方法通常涉及复杂的方法,并且需要高级编程技能,这对许多研究人员来说是一个障碍。因此,我们开发了一个R Shiny工具NMAsurv (https://psurvivala.shinyapps.io/NMAsurv/)。NMAsurv允许很少或没有R背景的用户毫不费力地进行基于生存数据的NMA。该工具支持绘制网络图、测试PH假设、构建NMA模型等多种功能。用户可以输入重构的伪个人参与者数据或聚合数据。NMAsurv提供了一个用户友好的界面,用于从各种NMA模型中提取参数估计,包括分数多项式,分段指数模型,参数生存模型,Cox PH模型和广义gamma模型。此外,它使用户能够毫不费力地创建生存和人力资源情节。所有操作都可以通过直观的“点击”界面进行。在本研究中,我们介绍了NMAsurv的所有功能和特征,并通过一个真实的NMA示例演示了它的应用。
{"title":"NMAsurv: An R Shiny application for network meta-analysis based on survival data.","authors":"Taihang Shao, Mingye Zhao, Fenghao Shi, Mingjun Rui, Wenxi Tang","doi":"10.1017/rsm.2025.10020","DOIUrl":"10.1017/rsm.2025.10020","url":null,"abstract":"<p><p>Network meta-analysis (NMA) is becoming increasingly important, especially in the field of medicine, as it allows for comparisons across multiple trials with different interventions. For time-to-event data, that is, survival data, traditional NMA based on the proportional hazards (PH) assumption simply synthesizes reported hazard ratios (HRs). Novel methods for NMA based on the non-PH assumption have been proposed and implemented using R software. However, these methods often involve complex methodologies and require advanced programming skills, creating a barrier for many researchers. Therefore, we developed an R Shiny tool, NMAsurv (https://psurvivala.shinyapps.io/NMAsurv/). NMAsurv allows users with little or zero background in R to conduct survival-data-based NMA effortlessly. The tool supports various functions such as drawing network plots, testing the PH assumption, and building NMA models. Users can input either reconstructed pseudo-individual participant data or aggregated data. NMAsurv offers a user-friendly interface for extracting parameter estimations from various NMA models, including fractional polynomial, piecewise exponential models, parametric survival models, Cox PH model, and generalized gamma model. Additionally, it enables users to effortlessly create survival and HR plots. All operations can be performed by an intuitive \"point-and-click\" interface. In this study, we introduce all the functionalities and features of NMAsurv and demonstrate its application using a real-world NMA example.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"1042-1056"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What happens to qualitative studies initially presented as conference abstracts: A survey among study authors. 最初作为会议摘要发表的定性研究发生了什么:对研究作者的调查。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-09-05 DOI: 10.1017/rsm.2025.10033
Marwin Weber, Simon Lewin, Joerg J Meerpohl, Heather Menzies Munthe-Kaas, Rigmor Berg, Andrew Booth, Claire Glenton, Jane Noyes, Ingrid Toews

Qualitative research addresses important healthcare questions, including patients' experiences with interventions. Qualitative evidence syntheses combine findings from individual studies and are increasingly used to inform health guidelines. However, dissemination bias-selective non-dissemination of studies or findings-may distort the body of evidence. This study examined reasons for the non-dissemination of qualitative studies. We identified conference abstracts reporting qualitative, health-related studies. We invited authors to answer a survey containing quantitative and qualitative questions. We performed descriptive analyses on the quantitative data and inductive thematic analysis on the qualitative data. Most of the 142 respondents were female, established researchers. About a third reported that their study had not been published in full after their conference presentation. The main reasons were time constraints, career changes, and a lack of interest. Few indicated non-publication due to the nature of the study findings. Decisions not to publish were largely made by author teams. Half of the 72% who published their study reported that all findings were included in the publication. This study highlights researchers' reasons for non-dissemination of qualitative research. One-third of studies presented as conference abstracts remained unpublished, but non-dissemination was rarely linked to the study findings. Further research is needed to understand the systematic non-dissemination of qualitative studies.

定性研究解决了重要的医疗保健问题,包括患者的干预经验。定性证据综合综合了来自个别研究的结果,并越来越多地用于为健康指南提供信息。然而,传播偏见——选择性地不传播研究或发现——可能会扭曲证据体。本研究探讨了质性研究不传播的原因。我们选取了与健康相关的定性研究的会议摘要。我们邀请作者回答一项包含定量和定性问题的调查。我们对定量数据进行描述性分析,对定性数据进行归纳性专题分析。在142名受访者中,大多数是女性、资深研究人员。大约三分之一的人报告说,他们的研究在会议上发表后没有全文发表。主要原因是时间限制、职业变化和缺乏兴趣。由于研究结果的性质,很少有人表示不发表。不发表的决定主要是由作者团队做出的。在发表研究的72%的人中,有一半的人报告说所有的研究结果都包含在出版物中。本研究突出了研究者不传播定性研究的原因。三分之一作为会议摘要发表的研究尚未发表,但不传播很少与研究结果联系起来。需要进一步的研究来了解定性研究的系统性不传播。
{"title":"What happens to qualitative studies initially presented as conference abstracts: A survey among study authors.","authors":"Marwin Weber, Simon Lewin, Joerg J Meerpohl, Heather Menzies Munthe-Kaas, Rigmor Berg, Andrew Booth, Claire Glenton, Jane Noyes, Ingrid Toews","doi":"10.1017/rsm.2025.10033","DOIUrl":"10.1017/rsm.2025.10033","url":null,"abstract":"<p><p>Qualitative research addresses important healthcare questions, including patients' experiences with interventions. Qualitative evidence syntheses combine findings from individual studies and are increasingly used to inform health guidelines. However, dissemination bias-selective non-dissemination of studies or findings-may distort the body of evidence. This study examined reasons for the non-dissemination of qualitative studies. We identified conference abstracts reporting qualitative, health-related studies. We invited authors to answer a survey containing quantitative and qualitative questions. We performed descriptive analyses on the quantitative data and inductive thematic analysis on the qualitative data. Most of the 142 respondents were female, established researchers. About a third reported that their study had not been published in full after their conference presentation. The main reasons were time constraints, career changes, and a lack of interest. Few indicated non-publication due to the nature of the study findings. Decisions not to publish were largely made by author teams. Half of the 72% who published their study reported that all findings were included in the publication. This study highlights researchers' reasons for non-dissemination of qualitative research. One-third of studies presented as conference abstracts remained unpublished, but non-dissemination was rarely linked to the study findings. Further research is needed to understand the systematic non-dissemination of qualitative studies.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"1025-1034"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trials and triangles: Network meta-analysis of multi-arm trials with correlated arms. 试验与三角形:具有相关臂的多臂试验的网络meta分析。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-08-01 DOI: 10.1017/rsm.2025.10026
Gerta Rücker, Guido Schwarzer

For network meta-analysis (NMA), we usually assume that the treatment arms are independent within each included trial. This assumption is justified for parallel design trials and leads to a property we call consistency of variances for both multi-arm trials and NMA estimates. However, the assumption is violated for trials with correlated arms, for example, split-body trials. For multi-arm trials with correlated arms, the variance of a contrast is not the sum of the arm-based variances, but comes with a correlation term. This may lead to violations of variance consistency, and the inconsistency of variances may even propagate to the NMA estimates. We explain this using a geometric analogy where three-arm trials correspond to triangles and four-arm trials correspond to tetrahedrons. We also investigate which information has to be extracted for a multi-arm trial with correlated arms and provide an algorithm to analyze NMAs including such trials.

对于网络荟萃分析(NMA),我们通常假设每个纳入试验的治疗组是独立的。这一假设在平行设计试验中是合理的,并导致我们称之为多臂试验和NMA估计方差一致性的特性。然而,对于具有相关臂的试验,例如分离体试验,则违反了该假设。对于具有相关臂的多臂试验,对比的方差不是基于臂的方差的总和,而是带有相关项。这可能导致违反方差一致性,并且方差的不一致性甚至可能传播到NMA估计中。我们使用几何类比来解释这一点,其中三臂试验对应三角形,四臂试验对应四面体。我们还研究了具有相关臂的多臂试验必须提取哪些信息,并提供了一种算法来分析包括此类试验在内的nma。
{"title":"Trials and triangles: Network meta-analysis of multi-arm trials with correlated arms.","authors":"Gerta Rücker, Guido Schwarzer","doi":"10.1017/rsm.2025.10026","DOIUrl":"10.1017/rsm.2025.10026","url":null,"abstract":"<p><p>For network meta-analysis (NMA), we usually assume that the treatment arms are independent within each included trial. This assumption is justified for parallel design trials and leads to a property we call consistency of variances for both multi-arm trials and NMA estimates. However, the assumption is violated for trials with correlated arms, for example, split-body trials. For multi-arm trials with correlated arms, the variance of a contrast is not the sum of the arm-based variances, but comes with a correlation term. This may lead to violations of variance consistency, and the inconsistency of variances may even propagate to the NMA estimates. We explain this using a geometric analogy where three-arm trials correspond to triangles and four-arm trials correspond to tetrahedrons. We also investigate which information has to be extracted for a multi-arm trial with correlated arms and provide an algorithm to analyze NMAs including such trials.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"961-974"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657662/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge user involvement is still uncommon in published rapid reviews-a meta-research cross-sectional study. 知识使用者参与在已发表的快速评论中仍然不常见,这是一种元研究的横断面研究。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-07-10 DOI: 10.1017/rsm.2025.10018
Barbara Nussbaumer-Streit, Dominic Ledinger, Christina Kien, Irma Klerings, Emma Persad, Andrea Chapman, Claus Nowak, Arianna Gadinger, Lisa Affengruber, Maureen Smith, Gerald Gartlehner, Ursula Griebler

Background: Involving knowledge users (KUs) such as patients, clinicians, or health policymakers is particularly relevant when conducting rapid reviews (RRs), as they should be tailored to decision-makers' needs. However, little is known about how common KU involvement currently is in RRs.

Objectives: We wanted to assess the proportion of KU involvement reported in recently published RRs (2021 onwards), which groups of KUs were involved in each phase of the RR process, to what extent, and which factors were associated with KU involvement in RRs.

Methods: We conducted a meta-research cross-sectional study. A systematic literature search in Ovid MEDLINE and Epistemonikos in January 2024 identified 2,493 unique records. We dually screened the identified records (partly with assistance from an artificial intelligence (AI)-based application) until we reached the a priori calculated sample size of 104 RRs. We dually extracted data and analyzed it descriptively.

Results: The proportion of RRs that reported KU involvement was 19% (95% confidence interval [CI]: 12%-28%). Most often, KUs were involved during the initial preparation of the RR, the systematic searches, and the interpretation and dissemination of results. Researchers/content experts and public/patient partners were the KU groups most often involved. KU involvement was more common in RRs focusing on patient involvement/shared decision-making, having a published protocol, and being commissioned.

Conclusions: Reporting KU involvement in published RRs is uncommon and often vague. Future research should explore barriers and facilitators for KU involvement and its reporting in RRs. Guidance regarding reporting on KU involvement in RRs is needed.

背景:在进行快速审查(RRs)时,让知识使用者(ku),如患者、临床医生或卫生政策制定者参与尤为重要,因为它们应该根据决策者的需求进行调整。然而,目前对KU在rs中的常见程度知之甚少。目的:我们想评估最近发表的RRs(2021年起)中报告的KU参与比例,哪些KU组参与了RR过程的每个阶段,程度如何,以及哪些因素与KU参与RRs相关。方法:我们进行了一项荟萃研究横断面研究。2024年1月,在Ovid MEDLINE和Epistemonikos中进行了系统的文献检索,确定了2,493条独特的记录。我们对已识别的记录进行了双重筛选(部分是在基于人工智能(AI)的应用程序的帮助下),直到我们达到104 rr的先验计算样本大小。我们对数据进行双重提取并进行描述性分析。结果:报告KU累及的rr比例为19%(95%可信区间[CI]: 12%-28%)。大多数情况下,库库参与了研究报告的初始准备、系统检索以及结果的解释和传播。研究人员/内容专家和公众/患者合作伙伴是KU最常参与的群体。KU参与在关注患者参与/共同决策、有公布的协议和被委托的rr中更为常见。结论:在已发表的rrrs中报道KU累及的情况并不常见,而且往往含糊不清。未来的研究应探索KU参与的障碍和促进因素,并在rs中报告。需要关于报告堪萨斯大学参与rrr的指导。
{"title":"Knowledge user involvement is still uncommon in published rapid reviews-a meta-research cross-sectional study.","authors":"Barbara Nussbaumer-Streit, Dominic Ledinger, Christina Kien, Irma Klerings, Emma Persad, Andrea Chapman, Claus Nowak, Arianna Gadinger, Lisa Affengruber, Maureen Smith, Gerald Gartlehner, Ursula Griebler","doi":"10.1017/rsm.2025.10018","DOIUrl":"10.1017/rsm.2025.10018","url":null,"abstract":"<p><strong>Background: </strong>Involving knowledge users (KUs) such as patients, clinicians, or health policymakers is particularly relevant when conducting rapid reviews (RRs), as they should be tailored to decision-makers' needs. However, little is known about how common KU involvement currently is in RRs.</p><p><strong>Objectives: </strong>We wanted to assess the proportion of KU involvement reported in recently published RRs (2021 onwards), which groups of KUs were involved in each phase of the RR process, to what extent, and which factors were associated with KU involvement in RRs.</p><p><strong>Methods: </strong>We conducted a meta-research cross-sectional study. A systematic literature search in Ovid MEDLINE and Epistemonikos in January 2024 identified 2,493 unique records. We dually screened the identified records (partly with assistance from an artificial intelligence (AI)-based application) until we reached the a priori calculated sample size of 104 RRs. We dually extracted data and analyzed it descriptively.</p><p><strong>Results: </strong>The proportion of RRs that reported KU involvement was 19% (95% confidence interval [CI]: 12%-28%). Most often, KUs were involved during the initial preparation of the RR, the systematic searches, and the interpretation and dissemination of results. Researchers/content experts and public/patient partners were the KU groups most often involved. KU involvement was more common in RRs focusing on patient involvement/shared decision-making, having a published protocol, and being commissioned.</p><p><strong>Conclusions: </strong>Reporting KU involvement in published RRs is uncommon and often vague. Future research should explore barriers and facilitators for KU involvement and its reporting in RRs. Guidance regarding reporting on KU involvement in RRs is needed.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"876-899"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of semi-automated record screening methods for systematic reviews of prognosis studies and intervention studies. 对预后研究和干预研究系统评价的半自动记录筛选方法的评价。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-07-22 DOI: 10.1017/rsm.2025.10025
Isa Spiero, Artuur M Leeuwenberg, Karel G M Moons, Lotty Hooft, Johanna A A Damen

Systematic reviews (SRs) synthesize evidence through a rigorous, labor-intensive, and costly process. To accelerate the title-abstract screening phase of SRs, several artificial intelligence (AI)-based semi-automated screening tools have been developed to reduce workload by prioritizing relevant records. However, their performance is primarily evaluated for SRs of intervention studies, which generally have well-structured abstracts. Here, we evaluate whether screening tool performance is equally effective for SRs of prognosis studies that have larger heterogeneity between abstracts. We conducted retrospective simulations on prognosis and intervention reviews using a screening tool (ASReview). We also evaluated the effects of review scope (i.e., breadth of the research question), number of (relevant) records, and modeling methods within the tool. Performance was assessed in terms of recall (i.e., sensitivity), precision at 95% recall (i.e., positive predictive value at 95% recall), and workload reduction (work saved over sampling at 95% recall [WSS@95%]). The WSS@95% was slightly worse for prognosis reviews (range: 0.324-0.597) than for intervention reviews (range: 0.613-0.895). The precision was higher for prognosis (range: 0.115-0.400) compared to intervention reviews (range: 0.024-0.057). These differences were primarily due to the larger number of relevant records in the prognosis reviews. The modeling methods and the scope of the prognosis review did not significantly impact tool performance. We conclude that the larger abstract heterogeneity of prognosis studies does not substantially affect the effectiveness of screening tools for SRs of prognosis. Further evaluation studies including a standardized evaluation framework are needed to enable prospective decisions on the reliable use of screening tools.

系统评价(SRs)通过一个严格的、劳动密集的、昂贵的过程来综合证据。为了加快SRs的标题-摘要筛选阶段,开发了几种基于人工智能(AI)的半自动筛选工具,通过优先处理相关记录来减少工作量。然而,他们的表现主要是对干预研究的SRs进行评估,这些研究通常有结构良好的摘要。在这里,我们评估筛选工具的性能是否对预后研究中具有较大异质性的SRs同样有效。我们使用筛选工具(ASReview)对预后和干预评估进行回顾性模拟。我们还评估了回顾范围(即研究问题的广度)、(相关)记录的数量和工具中的建模方法的影响。通过召回率(即灵敏度)、95%召回率下的准确率(即95%召回率下的阳性预测值)和工作量减少(95%召回率下抽样节省的工作量[WSS@95%])来评估性能。预后评价(范围:0.324-0.597)的WSS@95%略低于干预评价(范围:0.613-0.895)。预后的准确度(范围:0.115-0.400)高于干预评价(范围:0.024-0.057)。这些差异主要是由于预后评价中相关记录较多。建模方法和预后评估的范围对工具性能没有显著影响。我们的结论是,预后研究中较大的抽象异质性并没有实质性地影响SRs预后筛查工具的有效性。需要进一步的评价研究,包括标准化评价框架,以便能够对可靠使用筛选工具作出前瞻性决定。
{"title":"Evaluation of semi-automated record screening methods for systematic reviews of prognosis studies and intervention studies.","authors":"Isa Spiero, Artuur M Leeuwenberg, Karel G M Moons, Lotty Hooft, Johanna A A Damen","doi":"10.1017/rsm.2025.10025","DOIUrl":"10.1017/rsm.2025.10025","url":null,"abstract":"<p><p>Systematic reviews (SRs) synthesize evidence through a rigorous, labor-intensive, and costly process. To accelerate the title-abstract screening phase of SRs, several artificial intelligence (AI)-based semi-automated screening tools have been developed to reduce workload by prioritizing relevant records. However, their performance is primarily evaluated for SRs of intervention studies, which generally have well-structured abstracts. Here, we evaluate whether screening tool performance is equally effective for SRs of prognosis studies that have larger heterogeneity between abstracts. We conducted retrospective simulations on prognosis and intervention reviews using a screening tool (ASReview). We also evaluated the effects of review scope (i.e., breadth of the research question), number of (relevant) records, and modeling methods within the tool. Performance was assessed in terms of recall (i.e., sensitivity), precision at 95% recall (i.e., positive predictive value at 95% recall), and workload reduction (work saved over sampling at 95% recall [WSS@95%]). The WSS@95% was slightly worse for prognosis reviews (range: 0.324-0.597) than for intervention reviews (range: 0.613-0.895). The precision was higher for prognosis (range: 0.115-0.400) compared to intervention reviews (range: 0.024-0.057). These differences were primarily due to the larger number of relevant records in the prognosis reviews. The modeling methods and the scope of the prognosis review did not significantly impact tool performance. We conclude that the larger abstract heterogeneity of prognosis studies does not substantially affect the effectiveness of screening tools for SRs of prognosis. Further evaluation studies including a standardized evaluation framework are needed to enable prospective decisions on the reliable use of screening tools.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"975-989"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining search filters for randomized controlled trials with the Cochrane RCT Classifier in Covidence: a methodological validation study. 将随机对照试验的搜索过滤器与Cochrane RCT分类器相结合:一项方法学验证研究。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-08-28 DOI: 10.1017/rsm.2025.10023
Klas Moberg, Carl Gornitzki

Our objective was to evaluate the recall and number needed to read (NNR) for the Cochrane RCT Classifier compared to and in combination with established search filters developed for Ovid MEDLINE and Embase.com. A gold standard set of 1,103 randomized controlled trials (RCTs) was created to calculate recall for the Cochrane RCT Classifier in Covidence, the Cochrane sensitivity-maximizing RCT filter in Ovid MEDLINE and the Cochrane Embase RCT filter for Embase.com. In addition, the classifier and the filters were validated in three case studies using reports from the Swedish Agency for Health Technology Assessment and Assessment of Social Services to assess impact on search results and NNR. The Cochrane RCT Classifier had the highest recall with 99.64% followed by the Cochrane sensitivity-maximizing RCT filter in Ovid MEDLINE with 98.73% and the Cochrane Embase RCT filter with 98.46%. However, the Cochrane RCT Classifier had a higher NNR than the RCT filters in all case studies. Combining the RCT filters with the Cochrane RCT Classifier reduced NNR compared to using the RCT filters alone while achieving a recall of 98.46% for the Ovid MEDLINE/RCT Classifier combination and 98.28% for the Embase/RCT Classifier combination. In conclusion, we found that the Cochrane RCT Classifier in Covidence has a higher recall than established search filters but also a higher NNR. Thus, using the Cochrane RCT Classifier instead of current state-of-the-art RCT filters would lead to an increased workload in the screening process. A viable option with a lower NNR than RCT filters, at the cost of a slight decrease in recall, is to combine the Cochrane RCT Classifier with RCT filters in database searches.

我们的目的是评估Cochrane RCT分类器的查全率和阅读数(NNR),并将其与为Ovid MEDLINE和Embase.com开发的已建立的搜索过滤器进行比较。建立了1103个随机对照试验(RCT)的金标准集,用于计算Cochrane RCT分类器在covid - ence中的召回率、Cochrane MEDLINE中的灵敏度最大化RCT过滤器和Cochrane Embase RCT过滤器在Embase.com中的召回率。此外,利用瑞典卫生技术评估和社会服务评估机构的报告,在三个案例研究中对分类器和过滤器进行了验证,以评估对搜索结果和NNR的影响。Cochrane RCT分类器的召回率最高,为99.64%,其次是Ovid MEDLINE中的Cochrane灵敏度最大化RCT过滤器,召回率为98.73%,Cochrane Embase RCT过滤器为98.46%。然而,在所有案例研究中,Cochrane RCT分类器的NNR都高于RCT过滤器。与单独使用RCT过滤器相比,将RCT过滤器与Cochrane RCT分类器结合使用降低了NNR,同时Ovid MEDLINE/RCT分类器组合的召回率为98.46%,Embase/RCT分类器组合的召回率为98.28%。总之,我们发现Cochrane RCT分类器在covid中具有比已建立的搜索过滤器更高的召回率,但也具有更高的NNR。因此,使用Cochrane RCT分类器而不是当前最先进的RCT过滤器将导致筛选过程中的工作量增加。一个可行的选择是在数据库搜索中结合Cochrane RCT分类器和RCT过滤器,其NNR比RCT过滤器低,但代价是召回率略有下降。
{"title":"Combining search filters for randomized controlled trials with the Cochrane RCT Classifier in Covidence: a methodological validation study.","authors":"Klas Moberg, Carl Gornitzki","doi":"10.1017/rsm.2025.10023","DOIUrl":"10.1017/rsm.2025.10023","url":null,"abstract":"<p><p>Our objective was to evaluate the recall and number needed to read (NNR) for the Cochrane RCT Classifier compared to and in combination with established search filters developed for Ovid MEDLINE and Embase.com. A gold standard set of 1,103 randomized controlled trials (RCTs) was created to calculate recall for the Cochrane RCT Classifier in Covidence, the Cochrane sensitivity-maximizing RCT filter in Ovid MEDLINE and the Cochrane Embase RCT filter for Embase.com. In addition, the classifier and the filters were validated in three case studies using reports from the Swedish Agency for Health Technology Assessment and Assessment of Social Services to assess impact on search results and NNR. The Cochrane RCT Classifier had the highest recall with 99.64% followed by the Cochrane sensitivity-maximizing RCT filter in Ovid MEDLINE with 98.73% and the Cochrane Embase RCT filter with 98.46%. However, the Cochrane RCT Classifier had a higher NNR than the RCT filters in all case studies. Combining the RCT filters with the Cochrane RCT Classifier reduced NNR compared to using the RCT filters alone while achieving a recall of 98.46% for the Ovid MEDLINE/RCT Classifier combination and 98.28% for the Embase/RCT Classifier combination. In conclusion, we found that the Cochrane RCT Classifier in Covidence has a higher recall than established search filters but also a higher NNR. Thus, using the Cochrane RCT Classifier instead of current state-of-the-art RCT filters would lead to an increased workload in the screening process. A viable option with a lower NNR than RCT filters, at the cost of a slight decrease in recall, is to combine the Cochrane RCT Classifier with RCT filters in database searches.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"953-960"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simple imputation method for meta-analysis of survival rates when precision information is missing. 在缺乏精确信息的情况下,用于生存率荟萃分析的简单归算方法。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-09-11 DOI: 10.1017/rsm.2025.10024
Kazushi Maruo, Yusuke Yamaguchi, Ryota Ishii, Hisashi Noma, Masahiko Gosho

In meta-analyses of survival rates, precision information (i.e., standard errors (SEs) or confidence intervals) are often missing in clinical studies. In current practice, such studies are often excluded from the synthesis analyses. However, the naïve deletion of these incomplete data can produce serious biases and loss of precision in pooled estimators. To address these issues, we developed a simple but effective method to impute precision information using commonly available statistics from individual studies, such as sample size, number of events, and risk set size at a time point of interest. By applying this new method, we can effectively circumvent the deletion of incomplete data, resultant biases, and losses of precision. Based on extensive simulation studies, the developed method markedly improves the accuracy and precision of the pooled estimators compared to those of naïve analyses that delete studies with missing precision. Furthermore, the performance of the proposed method was not significantly inferior to the ideal case, where there was no missing precision information. However, for studies for which the risk set size at the time of interest was not available, the proposed method runs the risk of overestimating the SE. Although the proposed method is a single-imputation method, the simulations show that there is no underestimation bias of the SE, even though the proposed method does not consider the uncertainty of missing values. To demonstrate the robustness of our proposed methods, they were applied in a systematic review of radiotherapy data. An R package was developed to implement the proposed procedure.

在生存率的荟萃分析中,临床研究往往缺少精确信息(即标准误差或置信区间)。在目前的实践中,这类研究经常被排除在综合分析之外。然而,naïve删除这些不完整的数据会在池估计器中产生严重的偏差和精度损失。为了解决这些问题,我们开发了一种简单但有效的方法,利用单个研究中常见的统计数据(如样本量、事件数量和某个感兴趣时间点的风险集大小)来推算精确信息。通过应用这种新方法,我们可以有效地避免不完整数据的删除,由此产生的偏差和精度损失。基于广泛的模拟研究,与naïve分析相比,所开发的方法显着提高了混合估计器的准确性和精度,这些分析删除了精度缺失的研究。此外,该方法的性能并不明显低于理想情况,其中没有丢失的精度信息。然而,对于无法获得所关注时间的风险集大小的研究,所提出的方法存在高估SE的风险。虽然所提出的方法是一种单次插值方法,但仿真表明,即使所提出的方法不考虑缺失值的不确定性,也不会存在SE的低估偏差。为了证明我们提出的方法的稳健性,它们被应用于放射治疗数据的系统回顾。开发了一个R包来实现所建议的程序。
{"title":"Simple imputation method for meta-analysis of survival rates when precision information is missing.","authors":"Kazushi Maruo, Yusuke Yamaguchi, Ryota Ishii, Hisashi Noma, Masahiko Gosho","doi":"10.1017/rsm.2025.10024","DOIUrl":"10.1017/rsm.2025.10024","url":null,"abstract":"<p><p>In meta-analyses of survival rates, precision information (i.e., standard errors (SEs) or confidence intervals) are often missing in clinical studies. In current practice, such studies are often excluded from the synthesis analyses. However, the naïve deletion of these incomplete data can produce serious biases and loss of precision in pooled estimators. To address these issues, we developed a simple but effective method to impute precision information using commonly available statistics from individual studies, such as sample size, number of events, and risk set size at a time point of interest. By applying this new method, we can effectively circumvent the deletion of incomplete data, resultant biases, and losses of precision. Based on extensive simulation studies, the developed method markedly improves the accuracy and precision of the pooled estimators compared to those of naïve analyses that delete studies with missing precision. Furthermore, the performance of the proposed method was not significantly inferior to the ideal case, where there was no missing precision information. However, for studies for which the risk set size at the time of interest was not available, the proposed method runs the risk of overestimating the SE. Although the proposed method is a single-imputation method, the simulations show that there is no underestimation bias of the SE, even though the proposed method does not consider the uncertainty of missing values. To demonstrate the robustness of our proposed methods, they were applied in a systematic review of radiotherapy data. An R package was developed to implement the proposed procedure.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"937-952"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657670/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using large language models to directly screen electronic databases as an alternative to traditional search strategies such as the Cochrane highly sensitive search for filtering randomized controlled trials in systematic reviews. 使用大型语言模型直接筛选电子数据库,作为传统搜索策略的替代方案,如Cochrane高灵敏度搜索,用于筛选系统评价中的随机对照试验。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-10-10 DOI: 10.1017/rsm.2025.10034
Viet-Thi Tran, Carolina Grana Possamai, Isabelle Boutron, Philippe Ravaud

A critical step in systematic reviews involves the definition of a search strategy, with keywords and Boolean logic, to filter electronic databases. We hypothesize that it is possible to screen articles in electronic databases using large language models (LLMs) as an alternative to search equations. To investigate this matter, we compared two methods to identify randomized controlled trials (RCTs) in electronic databases: filtering databases using the Cochrane highly sensitive search and an assessment by an LLM.We retrieved studies indexed in PubMed with a publication date between September 1 and September 30, 2024 using the sole keyword "diabetes." We compared the performance of the Cochrane highly sensitive search and the assessment of all titles and abstracts extracted directly from the database by GPT-4o-mini to identify RCTs. Reference standard was the manual screening of retrieved articles by two independent reviewers.The search retrieved 6377 records, of which 210 (3.5%) were primary reports of RCTs. The Cochrane highly sensitive search filtered 2197 records and missed one RCT (sensitivity 99.5%, 95% CI 97.4% to100%; specificity 67.8%, 95% CI 66.6% to 68.9%). Assessment of all titles and abstracts from the electronic database by GPT filtered 1080 records and included all 210 primary reports of RCTs (sensitivity 100%, 95% CI 98.3% to100%; specificity 85.9%, 95% CI 85.0% to 86.8%).LLMs can screen all articles in electronic databases to identify RCTs as an alternative to the Cochrane highly sensitive search. This calls for the evaluation of LLMs as an alternative to rigid search strategies.

系统评价的关键步骤包括定义搜索策略,使用关键词和布尔逻辑来过滤电子数据库。我们假设可以使用大型语言模型(llm)作为搜索方程的替代方案来筛选电子数据库中的文章。为了研究这个问题,我们比较了在电子数据库中识别随机对照试验(rct)的两种方法:使用Cochrane高敏感搜索筛选数据库和由法学硕士评估。我们检索了PubMed索引中发表日期在2024年9月1日至9月30日之间的研究,使用唯一的关键词“糖尿病”。我们比较了Cochrane高敏感检索的性能和通过gpt - 40 -mini直接从数据库中提取的所有标题和摘要的评估,以确定rct。参考标准是由两名独立审稿人对检索到的文章进行人工筛选。检索到6377条记录,其中210条(3.5%)为rct的主要报告。Cochrane高灵敏度搜索过滤了2197条记录,遗漏了1项RCT(灵敏度99.5%,95% CI 97.4% ~ 100%;特异性67.8%,95% CI 66.6% ~ 68.9%)。通过GPT对电子数据库中的所有标题和摘要进行评估,筛选了1080条记录,并纳入了所有210篇rct的主要报告(敏感性100%,95% CI 98.3%至100%;特异性85.9%,95% CI 85.0%至86.8%)。法学硕士可以筛选电子数据库中的所有文章,以确定rct作为Cochrane高灵敏度搜索的替代方法。这就要求对法学硕士进行评估,以替代严格的搜索策略。
{"title":"Using large language models to directly screen electronic databases as an alternative to traditional search strategies such as the Cochrane highly sensitive search for filtering randomized controlled trials in systematic reviews.","authors":"Viet-Thi Tran, Carolina Grana Possamai, Isabelle Boutron, Philippe Ravaud","doi":"10.1017/rsm.2025.10034","DOIUrl":"10.1017/rsm.2025.10034","url":null,"abstract":"<p><p>A critical step in systematic reviews involves the definition of a search strategy, with keywords and Boolean logic, to filter electronic databases. We hypothesize that it is possible to screen articles in electronic databases using large language models (LLMs) as an alternative to search equations. To investigate this matter, we compared two methods to identify randomized controlled trials (RCTs) in electronic databases: filtering databases using the Cochrane highly sensitive search and an assessment by an LLM.We retrieved studies indexed in PubMed with a publication date between September 1 and September 30, 2024 using the sole keyword \"diabetes.\" We compared the performance of the Cochrane highly sensitive search and the assessment of all titles and abstracts extracted directly from the database by GPT-4o-mini to identify RCTs. Reference standard was the manual screening of retrieved articles by two independent reviewers.The search retrieved 6377 records, of which 210 (3.5%) were primary reports of RCTs. The Cochrane highly sensitive search filtered 2197 records and missed one RCT (sensitivity 99.5%, 95% CI 97.4% to100%; specificity 67.8%, 95% CI 66.6% to 68.9%). Assessment of all titles and abstracts from the electronic database by GPT filtered 1080 records and included all 210 primary reports of RCTs (sensitivity 100%, 95% CI 98.3% to100%; specificity 85.9%, 95% CI 85.0% to 86.8%).LLMs can screen all articles in electronic databases to identify RCTs as an alternative to the Cochrane highly sensitive search. This calls for the evaluation of LLMs as an alternative to rigid search strategies.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"1035-1041"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Research Synthesis Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1