首页 > 最新文献

Research Synthesis Methods最新文献

英文 中文
Exploring the methodological quality and risk of bias in 200 systematic reviews: A comparative study of ROBIS and AMSTAR-2 tools. 探索200个系统评价的方法学质量和偏倚风险:ROBIS和AMSTAR-2工具的比较研究。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 Epub Date: 2025-10-27 DOI: 10.1017/rsm.2025.10032
Carole Lunny, Nityanand Jain, Tina Nazari, Melodi Kosaner-Kließ, Lucas Santos, Ian Goodman, Alaa A M Osman, Stefano Berrone, Mohammad Najm Dadam, Connor T A Brenna, Heba Hussein, Gioia Dahdal, Diana Cespedes A, Nicola Ferri, Salmaan Kanji, Yuan Chi, Dawid Pieper, Beverly Shea, Amanda Parker, Dipika Neupane, Paul A Khan, Daniella Rangira, Kat Kolaski, Ben Ridley, Amina Berour, Kevin Sun, Radin Hamidi Rad, Zihui Ouyang, Emma K Reid, Iván Pérez-Neri, Sanabel O Barakat, Silvia Bargeri, Silvia Gianola, Greta Castellini, Sera Whitelaw, Adrienne Stevens, Shailesh B Kolekar, Kristy Wong, Paityn Major, Ebrahim Bagheri, Andrea C Tricco

AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews, version 2) and ROBIS are tools used to assess the methodological quality and the risk of bias in a systematic review (SR). We applied AMSTAR-2 and ROBIS to a sample of 200 published SRs. We investigated the overlap in their methodological constructs, responses by item, and overall, percentage agreement, direction of effect, and timing of assessments. AMSTAR-2 contains 16 items and ROBIS 24 items. Three items in AMSTAR-2 and nine in ROBIS did not overlap in construct. Of the 200 SRs, 73% were low or critically low quality using AMSTAR-2, and 81% had a high risk of bias using ROBIS. The median time to complete AMSTAR-2 and ROBIS was 51 and 64 minutes, respectively. When assessment times were calibrated to the number of items in each tool, each item took an average of 3.2 minutes per item for AMSTAR-2 compared to 2.7 minutes for ROBIS. Nine percent of SRs had opposing ratings (i.e., AMSTAR-2 was high quality while ROBIS was high risk). In both tools, three-quarters of items showed more than 70% agreement between raters after extensive training and piloting. AMSTAR-2 and ROBIS provide complementary rather than interchangeable assessments of systematic reviews. AMSTAR-2 may be preferable when efficiency is prioritized and methodological rigour is the focus, whereas ROBIS offers a deeper examination of potential biases and external validity. Given the widespread reliance on systematic reviews for policy and practice, selecting the appropriate appraisal tool remains crucial. Future research should explore strategies to integrate the strengths of both instruments while minimizing the burden on assessors.

AMSTAR-2(评估系统评价的测量工具,版本2)和ROBIS是用于评估系统评价(SR)的方法学质量和偏倚风险的工具。我们将AMSTAR-2和ROBIS应用于200份已发表的sr样本。我们调查了他们的方法学结构、项目反应和总体、百分比一致、效果方向和评估时间的重叠。AMSTAR-2包含16个项目,ROBIS包含24个项目。AMSTAR-2中的3个项目和ROBIS中的9个项目在结构上没有重叠。在200个SRs中,73%使用AMSTAR-2评价为低质量或极低质量,81%使用ROBIS评价为高偏倚风险。完成AMSTAR-2和ROBIS的中位时间分别为51分钟和64分钟。当评估时间根据每个工具中的项目数量进行校准时,AMSTAR-2的每个项目平均花费3.2分钟,而ROBIS的每个项目平均花费2.7分钟。9%的SRs有相反的评级(即,AMSTAR-2是高质量的,而ROBIS是高风险的)。在这两种工具中,经过广泛的培训和指导,评价者之间在四分之三的项目中达成了70%以上的共识。AMSTAR-2和ROBIS提供的是互补的而不是可互换的系统审查评估。当效率优先且方法严谨是重点时,AMSTAR-2可能更好,而ROBIS提供了对潜在偏差和外部有效性的更深入检查。鉴于政策和实践普遍依赖系统审查,选择适当的评估工具仍然至关重要。未来的研究应探索整合两种工具优势的策略,同时尽量减少评估者的负担。
{"title":"Exploring the methodological quality and risk of bias in 200 systematic reviews: A comparative study of ROBIS and AMSTAR-2 tools.","authors":"Carole Lunny, Nityanand Jain, Tina Nazari, Melodi Kosaner-Kließ, Lucas Santos, Ian Goodman, Alaa A M Osman, Stefano Berrone, Mohammad Najm Dadam, Connor T A Brenna, Heba Hussein, Gioia Dahdal, Diana Cespedes A, Nicola Ferri, Salmaan Kanji, Yuan Chi, Dawid Pieper, Beverly Shea, Amanda Parker, Dipika Neupane, Paul A Khan, Daniella Rangira, Kat Kolaski, Ben Ridley, Amina Berour, Kevin Sun, Radin Hamidi Rad, Zihui Ouyang, Emma K Reid, Iván Pérez-Neri, Sanabel O Barakat, Silvia Bargeri, Silvia Gianola, Greta Castellini, Sera Whitelaw, Adrienne Stevens, Shailesh B Kolekar, Kristy Wong, Paityn Major, Ebrahim Bagheri, Andrea C Tricco","doi":"10.1017/rsm.2025.10032","DOIUrl":"10.1017/rsm.2025.10032","url":null,"abstract":"<p><p>AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews, version 2) and ROBIS are tools used to assess the methodological quality and the risk of bias in a systematic review (SR). We applied AMSTAR-2 and ROBIS to a sample of 200 published SRs. We investigated the overlap in their methodological constructs, responses by item, and overall, percentage agreement, direction of effect, and timing of assessments. AMSTAR-2 contains 16 items and ROBIS 24 items. Three items in AMSTAR-2 and nine in ROBIS did not overlap in construct. Of the 200 SRs, 73% were low or critically low quality using AMSTAR-2, and 81% had a high risk of bias using ROBIS. The median time to complete AMSTAR-2 and ROBIS was 51 and 64 minutes, respectively. When assessment times were calibrated to the number of items in each tool, each item took an average of 3.2 minutes per item for AMSTAR-2 compared to 2.7 minutes for ROBIS. Nine percent of SRs had opposing ratings (i.e., AMSTAR-2 was high quality while ROBIS was high risk). In both tools, three-quarters of items showed more than 70% agreement between raters after extensive training and piloting. AMSTAR-2 and ROBIS provide complementary rather than interchangeable assessments of systematic reviews. AMSTAR-2 may be preferable when efficiency is prioritized and methodological rigour is the focus, whereas ROBIS offers a deeper examination of potential biases and external validity. Given the widespread reliance on systematic reviews for policy and practice, selecting the appropriate appraisal tool remains crucial. Future research should explore strategies to integrate the strengths of both instruments while minimizing the burden on assessors.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"63-92"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823211/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automating the data extraction process for systematic reviews using GPT-4o and o3. 使用gpt - 40和o3自动化系统审查的数据提取过程。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 Epub Date: 2025-09-17 DOI: 10.1017/rsm.2025.10030
Yuki Kataoka, Tomohiro Takayama, Keisuke Yoshimura, Ryuhei So, Yasushi Tsujimoto, Yosuke Yamagishi, Shiro Takagi, Yuki Furukawa, Masatsugu Sakata, Đorđe Bašić, Andrea Cipriani, Pim Cuijpers, Eirini Karyotaki, Mathias Harrer, Stefan Leucht, Ava Homiar, Edoardo G Ostinelli, Clara Miguel, Alessandro Rodolico, Toshi A Furukawa

Large language models have shown promise for automating data extraction (DE) in systematic reviews (SRs), but most existing approaches require manual interaction. We developed an open-source system using GPT-4o to automatically extract data with no human intervention during the extraction process. We developed the system on a dataset of 290 randomized controlled trials (RCTs) from a published SR about cognitive behavioral therapy for insomnia. We evaluated the system on two other datasets: 5 RCTs from an updated search for the same review and 10 RCTs used in a separate published study that had also evaluated automated DE. We developed the best approach across all variables in the development dataset using GPT-4o. The performance in the updated-search dataset using o3 was 74.9% sensitivity, 76.7% specificity, 75.7 precision, 93.5% variable detection comprehensiveness, and 75.3% accuracy. In both datasets, accuracy was higher for string variables (e.g., country, study design, drug names, and outcome definitions) compared with numeric variables. In the third external validation dataset, GPT-4o showed a lower performance with a mean accuracy of 84.4% compared with the previous study. However, by adjusting our DE method, while maintaining the same prompting technique, we achieved a mean accuracy of 96.3%, which was comparable to the previous manual extraction study. Our system shows potential for assisting the DE of string variables alongside a human reviewer. However, it cannot yet replace humans for numeric DE. Further evaluation across diverse review contexts is needed to establish broader applicability.

大型语言模型已经显示出在系统审查(SRs)中自动化数据提取(DE)的希望,但是大多数现有的方法需要人工交互。我们使用gpt - 40开发了一个开源系统,在提取过程中无需人工干预即可自动提取数据。我们在290个随机对照试验(rct)的数据集上开发了这个系统,这些试验来自一篇发表的关于失眠认知行为疗法的研究报告。我们在另外两个数据集上评估了该系统:来自同一综述的更新搜索的5个随机对照试验,以及来自另一项独立发表的研究的10个随机对照试验,该研究也评估了自动化DE。我们使用gpt - 40开发了跨越开发数据集中所有变量的最佳方法。使用o3的更新搜索数据集的性能为灵敏度74.9%,特异性76.7%,精度75.7,变量检测全面性93.5%,准确性75.3%。在这两个数据集中,与数字变量相比,字符串变量(如国家、研究设计、药物名称和结果定义)的准确性更高。在第三个外部验证数据集中,gpt - 40表现出较低的性能,平均准确率为84.4%。然而,通过调整我们的DE方法,在保持相同提示技术的情况下,我们获得了96.3%的平均准确率,与之前的人工提取研究相当。我们的系统显示了协助字符串变量DE和人工审阅的潜力。然而,它还不能取代人类的数值DE。需要在不同的审查背景下进行进一步的评估,以建立更广泛的适用性。
{"title":"Automating the data extraction process for systematic reviews using GPT-4o and o3.","authors":"Yuki Kataoka, Tomohiro Takayama, Keisuke Yoshimura, Ryuhei So, Yasushi Tsujimoto, Yosuke Yamagishi, Shiro Takagi, Yuki Furukawa, Masatsugu Sakata, Đorđe Bašić, Andrea Cipriani, Pim Cuijpers, Eirini Karyotaki, Mathias Harrer, Stefan Leucht, Ava Homiar, Edoardo G Ostinelli, Clara Miguel, Alessandro Rodolico, Toshi A Furukawa","doi":"10.1017/rsm.2025.10030","DOIUrl":"10.1017/rsm.2025.10030","url":null,"abstract":"<p><p>Large language models have shown promise for automating data extraction (DE) in systematic reviews (SRs), but most existing approaches require manual interaction. We developed an open-source system using GPT-4o to automatically extract data with no human intervention during the extraction process. We developed the system on a dataset of 290 randomized controlled trials (RCTs) from a published SR about cognitive behavioral therapy for insomnia. We evaluated the system on two other datasets: 5 RCTs from an updated search for the same review and 10 RCTs used in a separate published study that had also evaluated automated DE. We developed the best approach across all variables in the development dataset using GPT-4o. The performance in the updated-search dataset using o3 was 74.9% sensitivity, 76.7% specificity, 75.7 precision, 93.5% variable detection comprehensiveness, and 75.3% accuracy. In both datasets, accuracy was higher for string variables (e.g., country, study design, drug names, and outcome definitions) compared with numeric variables. In the third external validation dataset, GPT-4o showed a lower performance with a mean accuracy of 84.4% compared with the previous study. However, by adjusting our DE method, while maintaining the same prompting technique, we achieved a mean accuracy of 96.3%, which was comparable to the previous manual extraction study. Our system shows potential for assisting the DE of string variables alongside a human reviewer. However, it cannot yet replace humans for numeric DE. Further evaluation across diverse review contexts is needed to establish broader applicability.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"42-62"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What can we learn from 1,000 meta-analyses across 10 different disciplines? 我们能从10个不同学科的1000项荟萃分析中学到什么?
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 Epub Date: 2025-10-02 DOI: 10.1017/rsm.2025.10035
Weilun Wu, Jianhua Duan, W Robert Reed, Elizabeth Tipton

This study analyzes 1,000 meta-analyses drawn from 10 disciplines-including medicine, psychology, education, biology, and economics-to document and compare methodological practices across fields. We find large differences in the size of meta-analyses, the number of effect sizes per study, and the types of effect sizes used. Disciplines also vary in their use of unpublished studies, the frequency and type of tests for publication bias, and whether they attempt to correct for it. Notably, many meta-analyses include multiple effect sizes from the same study, yet fail to account for statistical dependence in their analyses. We document the limited use of advanced methods-such as multilevel models and cluster-adjusted standard errors-that can accommodate dependent data structures. Correlations are frequently used as effect sizes in some disciplines, yet researchers often fail to address the methodological issues this introduces, including biased weighting and misleading tests for publication bias. We also find that meta-regression is underutilized, even when sample sizes are large enough to support it. This work serves as a resource for researchers conducting their first meta-analyses, as a benchmark for researchers designing simulation experiments, and as a reference for applied meta-analysts aiming to improve their methodological practices.

这项研究分析了来自10个学科——包括医学、心理学、教育、生物学和经济学——的1000项荟萃分析,以记录和比较不同领域的方法实践。我们发现meta分析的规模、每项研究的效应量数量和使用的效应量类型存在很大差异。各学科在未发表研究的使用、发表偏倚测试的频率和类型以及是否试图纠正偏倚方面也各不相同。值得注意的是,许多荟萃分析包括来自同一研究的多个效应量,但未能解释其分析中的统计依赖性。我们记录了高级方法(如多层模型和集群调整标准误差)的有限使用,这些方法可以容纳依赖的数据结构。在某些学科中,相关性经常被用作效应值,但研究人员往往无法解决由此引入的方法学问题,包括偏倚加权和误导性的发表偏倚检验。我们还发现,即使样本量足够大,元回归也没有得到充分利用。这项工作可作为研究人员进行第一次元分析的资源,作为研究人员设计模拟实验的基准,并作为旨在改进其方法实践的应用元分析的参考。
{"title":"What can we learn from 1,000 meta-analyses across 10 different disciplines?","authors":"Weilun Wu, Jianhua Duan, W Robert Reed, Elizabeth Tipton","doi":"10.1017/rsm.2025.10035","DOIUrl":"10.1017/rsm.2025.10035","url":null,"abstract":"<p><p>This study analyzes 1,000 meta-analyses drawn from 10 disciplines-including medicine, psychology, education, biology, and economics-to document and compare methodological practices across fields. We find large differences in the size of meta-analyses, the number of effect sizes per study, and the types of effect sizes used. Disciplines also vary in their use of unpublished studies, the frequency and type of tests for publication bias, and whether they attempt to correct for it. Notably, many meta-analyses include multiple effect sizes from the same study, yet fail to account for statistical dependence in their analyses. We document the limited use of advanced methods-such as multilevel models and cluster-adjusted standard errors-that can accommodate dependent data structures. Correlations are frequently used as effect sizes in some disciplines, yet researchers often fail to address the methodological issues this introduces, including biased weighting and misleading tests for publication bias. We also find that meta-regression is underutilized, even when sample sizes are large enough to support it. This work serves as a resource for researchers conducting their first meta-analyses, as a benchmark for researchers designing simulation experiments, and as a reference for applied meta-analysts aiming to improve their methodological practices.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"123-156"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823205/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating the possibility of cure into network meta-analyses: A case study from resected Stage III/IV melanoma. 将治愈的可能性纳入网络荟萃分析:切除III/IV期黑色素瘤的案例研究。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-01 Epub Date: 2025-10-15 DOI: 10.1017/rsm.2025.10038
Keith Chan, Sarah Goring, Kabirraaj Toor, Murat Kurt, Andriy Moshyk, Jeroen Jansen

In many areas of oncology, cancer drugs are now associated with long-term survivorship and mixture cure models (MCM) are increasingly being used for survival analysis. The objective of this article was to propose a methodology for conducting network meta-analysis (NMA) of MCM. This method was illustrated through a case study evaluating recurrence-free survival (RFS) with adjuvant therapy for stage III/IV resected melanoma. For the case study, the MCM NMA was conducted by: (1) fitting MCMs to each trial included within the network of evidence; and (2) incorporating the parameters of the MCMs into a multivariate NMA. Outputs included relative effect estimates for the MCM NMA as well as absolute estimates of survival (RFS), modeled within the Bayesian multivariate NMA, by incorporating absolute baseline effects of the reference treatment. The case study was intended for illustrative purposes of the MCM NMA methodology and is not meant for clinical interpretation. The case study demonstrated the feasibility of conducting an MCM NMA and highlighted key issues and considerations when conducting such analyses, including plausibility of cure, maturity of data, process for model selection, and the presentation and interpretation of results. MCM NMA provides a method of comparative survival that acknowledges the benefit newer treatments may confer on a subset of patients, resulting in long-term survival and reflection of this survival in extrapolation. In the future, this method may provide an additional metric to compare treatments that is of value to patients.

在肿瘤学的许多领域,癌症药物现在与长期生存和混合治疗模型(MCM)越来越多地被用于生存分析。本文的目的是提出一种进行MCM网络元分析(NMA)的方法。这种方法是通过评估辅助治疗III/IV期切除黑色素瘤的无复发生存(RFS)的案例研究来说明的。对于案例研究,MCM NMA是通过以下方式进行的:(1)将MCM拟合到证据网络中的每个试验中;(2)将mcm的参数纳入多元NMA。输出包括MCM NMA的相对效果估计,以及在贝叶斯多变量NMA中建模的生存绝对估计(RFS),通过合并参考治疗的绝对基线效果。该案例研究旨在说明MCM NMA方法的目的,并不意味着临床解释。该案例研究展示了进行MCM NMA的可行性,并强调了进行此类分析时的关键问题和考虑因素,包括治疗的合理性、数据的成熟度、模型选择的过程以及结果的呈现和解释。MCM NMA提供了一种比较生存的方法,它承认新治疗可能给一部分患者带来的好处,从而导致长期生存,并在外推中反映这种生存。在未来,这种方法可能会提供一个额外的指标来比较对患者有价值的治疗。
{"title":"Incorporating the possibility of cure into network meta-analyses: A case study from resected Stage III/IV melanoma.","authors":"Keith Chan, Sarah Goring, Kabirraaj Toor, Murat Kurt, Andriy Moshyk, Jeroen Jansen","doi":"10.1017/rsm.2025.10038","DOIUrl":"10.1017/rsm.2025.10038","url":null,"abstract":"<p><p>In many areas of oncology, cancer drugs are now associated with long-term survivorship and mixture cure models (MCM) are increasingly being used for survival analysis. The objective of this article was to propose a methodology for conducting network meta-analysis (NMA) of MCM. This method was illustrated through a case study evaluating recurrence-free survival (RFS) with adjuvant therapy for stage III/IV resected melanoma. For the case study, the MCM NMA was conducted by: (1) fitting MCMs to each trial included within the network of evidence; and (2) incorporating the parameters of the MCMs into a multivariate NMA. Outputs included relative effect estimates for the MCM NMA as well as absolute estimates of survival (RFS), modeled within the Bayesian multivariate NMA, by incorporating absolute baseline effects of the reference treatment. The case study was intended for illustrative purposes of the MCM NMA methodology and is not meant for clinical interpretation. The case study demonstrated the feasibility of conducting an MCM NMA and highlighted key issues and considerations when conducting such analyses, including plausibility of cure, maturity of data, process for model selection, and the presentation and interpretation of results. MCM NMA provides a method of comparative survival that acknowledges the benefit newer treatments may confer on a subset of patients, resulting in long-term survival and reflection of this survival in extrapolation. In the future, this method may provide an additional metric to compare treatments that is of value to patients.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 1","pages":"157-169"},"PeriodicalIF":6.1,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823198/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StudyTypeTeller-Large language models to automatically classify research study types for systematic reviews. studytypeteller—大型语言模型,用于自动对研究研究类型进行分类,以便进行系统综述。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-09-11 DOI: 10.1017/rsm.2025.10031
Simona Emilova Doneva, Shirin de Viragh, Hanna Hubarava, Stefan Schandelmaier, Matthias Briel, Benjamin Victor Ineichen

screening, a labor-intensive aspect of systematic review, is increasingly challenging due to the rising volume of scientific publications. Recent advances suggest that generative large language models like generative pre-trained transformer (GPT) could aid this process by classifying references into study types such as randomized-controlled trials (RCTs) or animal studies prior to abstract screening. However, it is unknown how these GPT models perform in classifying such scientific study types in the biomedical field. Additionally, their performance has not been directly compared with earlier transformer-based models like bidirectional encoder representations from transformers (BERT). To address this, we developed a human-annotated corpus of 2,645 PubMed titles and abstracts, annotated for 14 study types, including different types of RCTs and animal studies, systematic reviews, study protocols, case reports, as well as in vitro studies. Using this corpus, we compared the performance of GPT-3.5 and GPT-4 in automatically classifying these study types against established BERT models. Our results show that fine-tuned pretrained BERT models consistently outperformed GPT models, achieving F1-scores above 0.8, compared to approximately 0.6 for GPT models. Advanced prompting strategies did not substantially boost GPT performance. In conclusion, these findings highlight that, even though GPT models benefit from advanced capabilities and extensive training data, their performance in niche tasks like scientific multi-class study classification is inferior to smaller fine-tuned models. Nevertheless, the use of automated methods remains promising for reducing the volume of records, making the screening of large reference libraries more feasible. Our corpus is openly available and can be used to harness other natural language processing (NLP) approaches.

筛选是系统评价的一个劳动密集型方面,由于科学出版物数量的增加,它越来越具有挑战性。最近的进展表明,生成式预训练转换器(GPT)等生成式大型语言模型可以通过在抽象筛选之前将参考文献分类为随机对照试验(rct)或动物研究等研究类型来帮助这一过程。然而,目前尚不清楚这些GPT模型如何在生物医学领域对此类科学研究类型进行分类。此外,它们的性能并没有直接与早期基于变压器的模型进行比较,比如来自变压器的双向编码器表示(BERT)。为了解决这个问题,我们开发了一个人类注释的库,包含2,645篇PubMed标题和摘要,注释了14种研究类型,包括不同类型的随机对照试验和动物研究、系统综述、研究方案、病例报告以及体外研究。使用该语料库,我们比较了GPT-3.5和GPT-4在根据已建立的BERT模型自动分类这些研究类型方面的性能。我们的研究结果表明,经过微调的预训练BERT模型始终优于GPT模型,其f1得分高于0.8,而GPT模型的f1得分约为0.6。先进的提示策略并没有显著提高GPT性能。总之,这些发现强调,尽管GPT模型受益于先进的能力和广泛的训练数据,但它们在科学多类研究分类等利基任务中的表现不如较小的微调模型。尽管如此,使用自动化方法仍然有望减少记录的数量,使筛选大型参考图书馆更加可行。我们的语料库是公开可用的,可用于利用其他自然语言处理(NLP)方法。
{"title":"StudyTypeTeller-Large language models to automatically classify research study types for systematic reviews.","authors":"Simona Emilova Doneva, Shirin de Viragh, Hanna Hubarava, Stefan Schandelmaier, Matthias Briel, Benjamin Victor Ineichen","doi":"10.1017/rsm.2025.10031","DOIUrl":"10.1017/rsm.2025.10031","url":null,"abstract":"<p><p>screening, a labor-intensive aspect of systematic review, is increasingly challenging due to the rising volume of scientific publications. Recent advances suggest that generative large language models like generative pre-trained transformer (GPT) could aid this process by classifying references into study types such as randomized-controlled trials (RCTs) or animal studies prior to abstract screening. However, it is unknown how these GPT models perform in classifying such scientific study types in the biomedical field. Additionally, their performance has not been directly compared with earlier transformer-based models like bidirectional encoder representations from transformers (BERT). To address this, we developed a human-annotated corpus of 2,645 PubMed titles and abstracts, annotated for 14 study types, including different types of RCTs and animal studies, systematic reviews, study protocols, case reports, as well as in vitro studies. Using this corpus, we compared the performance of GPT-3.5 and GPT-4 in automatically classifying these study types against established BERT models. Our results show that fine-tuned pretrained BERT models consistently outperformed GPT models, achieving F1-scores above 0.8, compared to approximately 0.6 for GPT models. Advanced prompting strategies did not substantially boost GPT performance. In conclusion, these findings highlight that, even though GPT models benefit from advanced capabilities and extensive training data, their performance in niche tasks like scientific multi-class study classification is inferior to smaller fine-tuned models. Nevertheless, the use of automated methods remains promising for reducing the volume of records, making the screening of large reference libraries more feasible. Our corpus is openly available and can be used to harness other natural language processing (NLP) approaches.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"1005-1024"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal large language models to screen citations for systematic reviews. 筛选系统评论引用的最佳大型语言模型。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-06-23 DOI: 10.1017/rsm.2025.10014
Takehiko Oami, Yohei Okada, Taka-Aki Nakada

Recent studies highlight the potential of large language models (LLMs) in citation screening for systematic reviews; however, the efficiency of individual LLMs for this application remains unclear. This study aimed to compare accuracy, time-related efficiency, cost, and consistency across four LLMs-GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.3 70B-for literature screening tasks. The models screened for clinical questions from the Japanese Clinical Practice Guidelines for the Management of Sepsis and Septic Shock 2024. Sensitivity and specificity were calculated for each model based on conventional citation screening results for qualitative assessment. We also recorded the time and cost of screening and assessed consistency to verify reproducibility. A post hoc analysis explored whether integrating outputs from multiple models could enhance screening accuracy. GPT-4o and Llama 3.3 70B achieved high specificity but lower sensitivity, while Gemini 1.5 Pro and Claude 3.5 Sonnet exhibited higher sensitivity at the cost of lower specificity. Citation screening times and costs varied, with GPT-4o being the fastest and Llama 3.3 70B the most cost-effective. Consistency was comparable among the models. An ensemble approach combining model outputs improved sensitivity but increased the number of false positives, requiring additional review effort. Each model demonstrated distinct strengths, effectively streamlining citation screening by saving time and reducing workload. However, reviewing false positives remains a challenge. Combining models may enhance sensitivity, indicating the potential of LLMs to optimize systematic review workflows.

最近的研究强调了大型语言模型(LLMs)在系统综述引文筛选中的潜力;然而,对于这种应用,单个llm的效率仍然不清楚。本研究旨在比较四种llms - gpt - 40、Gemini 1.5 Pro、Claude 3.5 Sonnet和Llama 3.3 70b在文献筛选任务中的准确性、时间相关效率、成本和一致性。这些模型筛选了日本2024年脓毒症和感染性休克管理临床实践指南中的临床问题。根据常规引文筛选结果计算每个模型的敏感性和特异性,进行定性评估。我们还记录了筛选的时间和成本,并评估一致性以验证可重复性。一项事后分析探讨了整合多个模型的输出是否可以提高筛选的准确性。gpt - 40和Llama 3.3 70B具有高特异性但灵敏度较低,而Gemini 1.5 Pro和Claude 3.5 Sonnet具有较高的灵敏度,但特异性较低。引文筛选时间和成本各不相同,gpt - 40是最快的,Llama 3.3 70B是最具成本效益的。各模型之间的一致性具有可比性。结合模型输出的集成方法提高了灵敏度,但增加了假阳性的数量,需要额外的审查工作。每个模型都展示了各自的优势,通过节省时间和减少工作量有效地简化了引文筛选。然而,审查假阳性仍然是一个挑战。结合模型可能会提高灵敏度,这表明llm在优化系统评审工作流程方面具有潜力。
{"title":"Optimal large language models to screen citations for systematic reviews.","authors":"Takehiko Oami, Yohei Okada, Taka-Aki Nakada","doi":"10.1017/rsm.2025.10014","DOIUrl":"10.1017/rsm.2025.10014","url":null,"abstract":"<p><p>Recent studies highlight the potential of large language models (LLMs) in citation screening for systematic reviews; however, the efficiency of individual LLMs for this application remains unclear. This study aimed to compare accuracy, time-related efficiency, cost, and consistency across four LLMs-GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.3 70B-for literature screening tasks. The models screened for clinical questions from the Japanese Clinical Practice Guidelines for the Management of Sepsis and Septic Shock 2024. Sensitivity and specificity were calculated for each model based on conventional citation screening results for qualitative assessment. We also recorded the time and cost of screening and assessed consistency to verify reproducibility. A <i>post hoc</i> analysis explored whether integrating outputs from multiple models could enhance screening accuracy. GPT-4o and Llama 3.3 70B achieved high specificity but lower sensitivity, while Gemini 1.5 Pro and Claude 3.5 Sonnet exhibited higher sensitivity at the cost of lower specificity. Citation screening times and costs varied, with GPT-4o being the fastest and Llama 3.3 70B the most cost-effective. Consistency was comparable among the models. An ensemble approach combining model outputs improved sensitivity but increased the number of false positives, requiring additional review effort. Each model demonstrated distinct strengths, effectively streamlining citation screening by saving time and reducing workload. However, reviewing false positives remains a challenge. Combining models may enhance sensitivity, indicating the potential of LLMs to optimize systematic review workflows.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"859-875"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657656/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NMAsurv: An R Shiny application for network meta-analysis based on survival data. NMAsurv:一个基于生存数据的网络元分析的R Shiny应用程序。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-07-10 DOI: 10.1017/rsm.2025.10020
Taihang Shao, Mingye Zhao, Fenghao Shi, Mingjun Rui, Wenxi Tang

Network meta-analysis (NMA) is becoming increasingly important, especially in the field of medicine, as it allows for comparisons across multiple trials with different interventions. For time-to-event data, that is, survival data, traditional NMA based on the proportional hazards (PH) assumption simply synthesizes reported hazard ratios (HRs). Novel methods for NMA based on the non-PH assumption have been proposed and implemented using R software. However, these methods often involve complex methodologies and require advanced programming skills, creating a barrier for many researchers. Therefore, we developed an R Shiny tool, NMAsurv (https://psurvivala.shinyapps.io/NMAsurv/). NMAsurv allows users with little or zero background in R to conduct survival-data-based NMA effortlessly. The tool supports various functions such as drawing network plots, testing the PH assumption, and building NMA models. Users can input either reconstructed pseudo-individual participant data or aggregated data. NMAsurv offers a user-friendly interface for extracting parameter estimations from various NMA models, including fractional polynomial, piecewise exponential models, parametric survival models, Cox PH model, and generalized gamma model. Additionally, it enables users to effortlessly create survival and HR plots. All operations can be performed by an intuitive "point-and-click" interface. In this study, we introduce all the functionalities and features of NMAsurv and demonstrate its application using a real-world NMA example.

网络荟萃分析(NMA)正变得越来越重要,特别是在医学领域,因为它允许在不同干预措施的多个试验之间进行比较。对于事件时间数据,即生存数据,传统的基于比例风险(PH)假设的NMA简单地综合了报告的风险比(hr)。提出了基于非ph假设的NMA新方法,并利用R软件实现了该方法。然而,这些方法通常涉及复杂的方法,并且需要高级编程技能,这对许多研究人员来说是一个障碍。因此,我们开发了一个R Shiny工具NMAsurv (https://psurvivala.shinyapps.io/NMAsurv/)。NMAsurv允许很少或没有R背景的用户毫不费力地进行基于生存数据的NMA。该工具支持绘制网络图、测试PH假设、构建NMA模型等多种功能。用户可以输入重构的伪个人参与者数据或聚合数据。NMAsurv提供了一个用户友好的界面,用于从各种NMA模型中提取参数估计,包括分数多项式,分段指数模型,参数生存模型,Cox PH模型和广义gamma模型。此外,它使用户能够毫不费力地创建生存和人力资源情节。所有操作都可以通过直观的“点击”界面进行。在本研究中,我们介绍了NMAsurv的所有功能和特征,并通过一个真实的NMA示例演示了它的应用。
{"title":"NMAsurv: An R Shiny application for network meta-analysis based on survival data.","authors":"Taihang Shao, Mingye Zhao, Fenghao Shi, Mingjun Rui, Wenxi Tang","doi":"10.1017/rsm.2025.10020","DOIUrl":"10.1017/rsm.2025.10020","url":null,"abstract":"<p><p>Network meta-analysis (NMA) is becoming increasingly important, especially in the field of medicine, as it allows for comparisons across multiple trials with different interventions. For time-to-event data, that is, survival data, traditional NMA based on the proportional hazards (PH) assumption simply synthesizes reported hazard ratios (HRs). Novel methods for NMA based on the non-PH assumption have been proposed and implemented using R software. However, these methods often involve complex methodologies and require advanced programming skills, creating a barrier for many researchers. Therefore, we developed an R Shiny tool, NMAsurv (https://psurvivala.shinyapps.io/NMAsurv/). NMAsurv allows users with little or zero background in R to conduct survival-data-based NMA effortlessly. The tool supports various functions such as drawing network plots, testing the PH assumption, and building NMA models. Users can input either reconstructed pseudo-individual participant data or aggregated data. NMAsurv offers a user-friendly interface for extracting parameter estimations from various NMA models, including fractional polynomial, piecewise exponential models, parametric survival models, Cox PH model, and generalized gamma model. Additionally, it enables users to effortlessly create survival and HR plots. All operations can be performed by an intuitive \"point-and-click\" interface. In this study, we introduce all the functionalities and features of NMAsurv and demonstrate its application using a real-world NMA example.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"1042-1056"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What happens to qualitative studies initially presented as conference abstracts: A survey among study authors. 最初作为会议摘要发表的定性研究发生了什么:对研究作者的调查。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-09-05 DOI: 10.1017/rsm.2025.10033
Marwin Weber, Simon Lewin, Joerg J Meerpohl, Heather Menzies Munthe-Kaas, Rigmor Berg, Andrew Booth, Claire Glenton, Jane Noyes, Ingrid Toews

Qualitative research addresses important healthcare questions, including patients' experiences with interventions. Qualitative evidence syntheses combine findings from individual studies and are increasingly used to inform health guidelines. However, dissemination bias-selective non-dissemination of studies or findings-may distort the body of evidence. This study examined reasons for the non-dissemination of qualitative studies. We identified conference abstracts reporting qualitative, health-related studies. We invited authors to answer a survey containing quantitative and qualitative questions. We performed descriptive analyses on the quantitative data and inductive thematic analysis on the qualitative data. Most of the 142 respondents were female, established researchers. About a third reported that their study had not been published in full after their conference presentation. The main reasons were time constraints, career changes, and a lack of interest. Few indicated non-publication due to the nature of the study findings. Decisions not to publish were largely made by author teams. Half of the 72% who published their study reported that all findings were included in the publication. This study highlights researchers' reasons for non-dissemination of qualitative research. One-third of studies presented as conference abstracts remained unpublished, but non-dissemination was rarely linked to the study findings. Further research is needed to understand the systematic non-dissemination of qualitative studies.

定性研究解决了重要的医疗保健问题,包括患者的干预经验。定性证据综合综合了来自个别研究的结果,并越来越多地用于为健康指南提供信息。然而,传播偏见——选择性地不传播研究或发现——可能会扭曲证据体。本研究探讨了质性研究不传播的原因。我们选取了与健康相关的定性研究的会议摘要。我们邀请作者回答一项包含定量和定性问题的调查。我们对定量数据进行描述性分析,对定性数据进行归纳性专题分析。在142名受访者中,大多数是女性、资深研究人员。大约三分之一的人报告说,他们的研究在会议上发表后没有全文发表。主要原因是时间限制、职业变化和缺乏兴趣。由于研究结果的性质,很少有人表示不发表。不发表的决定主要是由作者团队做出的。在发表研究的72%的人中,有一半的人报告说所有的研究结果都包含在出版物中。本研究突出了研究者不传播定性研究的原因。三分之一作为会议摘要发表的研究尚未发表,但不传播很少与研究结果联系起来。需要进一步的研究来了解定性研究的系统性不传播。
{"title":"What happens to qualitative studies initially presented as conference abstracts: A survey among study authors.","authors":"Marwin Weber, Simon Lewin, Joerg J Meerpohl, Heather Menzies Munthe-Kaas, Rigmor Berg, Andrew Booth, Claire Glenton, Jane Noyes, Ingrid Toews","doi":"10.1017/rsm.2025.10033","DOIUrl":"10.1017/rsm.2025.10033","url":null,"abstract":"<p><p>Qualitative research addresses important healthcare questions, including patients' experiences with interventions. Qualitative evidence syntheses combine findings from individual studies and are increasingly used to inform health guidelines. However, dissemination bias-selective non-dissemination of studies or findings-may distort the body of evidence. This study examined reasons for the non-dissemination of qualitative studies. We identified conference abstracts reporting qualitative, health-related studies. We invited authors to answer a survey containing quantitative and qualitative questions. We performed descriptive analyses on the quantitative data and inductive thematic analysis on the qualitative data. Most of the 142 respondents were female, established researchers. About a third reported that their study had not been published in full after their conference presentation. The main reasons were time constraints, career changes, and a lack of interest. Few indicated non-publication due to the nature of the study findings. Decisions not to publish were largely made by author teams. Half of the 72% who published their study reported that all findings were included in the publication. This study highlights researchers' reasons for non-dissemination of qualitative research. One-third of studies presented as conference abstracts remained unpublished, but non-dissemination was rarely linked to the study findings. Further research is needed to understand the systematic non-dissemination of qualitative studies.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"1025-1034"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trials and triangles: Network meta-analysis of multi-arm trials with correlated arms. 试验与三角形:具有相关臂的多臂试验的网络meta分析。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-08-01 DOI: 10.1017/rsm.2025.10026
Gerta Rücker, Guido Schwarzer

For network meta-analysis (NMA), we usually assume that the treatment arms are independent within each included trial. This assumption is justified for parallel design trials and leads to a property we call consistency of variances for both multi-arm trials and NMA estimates. However, the assumption is violated for trials with correlated arms, for example, split-body trials. For multi-arm trials with correlated arms, the variance of a contrast is not the sum of the arm-based variances, but comes with a correlation term. This may lead to violations of variance consistency, and the inconsistency of variances may even propagate to the NMA estimates. We explain this using a geometric analogy where three-arm trials correspond to triangles and four-arm trials correspond to tetrahedrons. We also investigate which information has to be extracted for a multi-arm trial with correlated arms and provide an algorithm to analyze NMAs including such trials.

对于网络荟萃分析(NMA),我们通常假设每个纳入试验的治疗组是独立的。这一假设在平行设计试验中是合理的,并导致我们称之为多臂试验和NMA估计方差一致性的特性。然而,对于具有相关臂的试验,例如分离体试验,则违反了该假设。对于具有相关臂的多臂试验,对比的方差不是基于臂的方差的总和,而是带有相关项。这可能导致违反方差一致性,并且方差的不一致性甚至可能传播到NMA估计中。我们使用几何类比来解释这一点,其中三臂试验对应三角形,四臂试验对应四面体。我们还研究了具有相关臂的多臂试验必须提取哪些信息,并提供了一种算法来分析包括此类试验在内的nma。
{"title":"Trials and triangles: Network meta-analysis of multi-arm trials with correlated arms.","authors":"Gerta Rücker, Guido Schwarzer","doi":"10.1017/rsm.2025.10026","DOIUrl":"10.1017/rsm.2025.10026","url":null,"abstract":"<p><p>For network meta-analysis (NMA), we usually assume that the treatment arms are independent within each included trial. This assumption is justified for parallel design trials and leads to a property we call consistency of variances for both multi-arm trials and NMA estimates. However, the assumption is violated for trials with correlated arms, for example, split-body trials. For multi-arm trials with correlated arms, the variance of a contrast is not the sum of the arm-based variances, but comes with a correlation term. This may lead to violations of variance consistency, and the inconsistency of variances may even propagate to the NMA estimates. We explain this using a geometric analogy where three-arm trials correspond to triangles and four-arm trials correspond to tetrahedrons. We also investigate which information has to be extracted for a multi-arm trial with correlated arms and provide an algorithm to analyze NMAs including such trials.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"961-974"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657662/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge user involvement is still uncommon in published rapid reviews-a meta-research cross-sectional study. 知识使用者参与在已发表的快速评论中仍然不常见,这是一种元研究的横断面研究。
IF 6.1 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-01 Epub Date: 2025-07-10 DOI: 10.1017/rsm.2025.10018
Barbara Nussbaumer-Streit, Dominic Ledinger, Christina Kien, Irma Klerings, Emma Persad, Andrea Chapman, Claus Nowak, Arianna Gadinger, Lisa Affengruber, Maureen Smith, Gerald Gartlehner, Ursula Griebler

Background: Involving knowledge users (KUs) such as patients, clinicians, or health policymakers is particularly relevant when conducting rapid reviews (RRs), as they should be tailored to decision-makers' needs. However, little is known about how common KU involvement currently is in RRs.

Objectives: We wanted to assess the proportion of KU involvement reported in recently published RRs (2021 onwards), which groups of KUs were involved in each phase of the RR process, to what extent, and which factors were associated with KU involvement in RRs.

Methods: We conducted a meta-research cross-sectional study. A systematic literature search in Ovid MEDLINE and Epistemonikos in January 2024 identified 2,493 unique records. We dually screened the identified records (partly with assistance from an artificial intelligence (AI)-based application) until we reached the a priori calculated sample size of 104 RRs. We dually extracted data and analyzed it descriptively.

Results: The proportion of RRs that reported KU involvement was 19% (95% confidence interval [CI]: 12%-28%). Most often, KUs were involved during the initial preparation of the RR, the systematic searches, and the interpretation and dissemination of results. Researchers/content experts and public/patient partners were the KU groups most often involved. KU involvement was more common in RRs focusing on patient involvement/shared decision-making, having a published protocol, and being commissioned.

Conclusions: Reporting KU involvement in published RRs is uncommon and often vague. Future research should explore barriers and facilitators for KU involvement and its reporting in RRs. Guidance regarding reporting on KU involvement in RRs is needed.

背景:在进行快速审查(RRs)时,让知识使用者(ku),如患者、临床医生或卫生政策制定者参与尤为重要,因为它们应该根据决策者的需求进行调整。然而,目前对KU在rs中的常见程度知之甚少。目的:我们想评估最近发表的RRs(2021年起)中报告的KU参与比例,哪些KU组参与了RR过程的每个阶段,程度如何,以及哪些因素与KU参与RRs相关。方法:我们进行了一项荟萃研究横断面研究。2024年1月,在Ovid MEDLINE和Epistemonikos中进行了系统的文献检索,确定了2,493条独特的记录。我们对已识别的记录进行了双重筛选(部分是在基于人工智能(AI)的应用程序的帮助下),直到我们达到104 rr的先验计算样本大小。我们对数据进行双重提取并进行描述性分析。结果:报告KU累及的rr比例为19%(95%可信区间[CI]: 12%-28%)。大多数情况下,库库参与了研究报告的初始准备、系统检索以及结果的解释和传播。研究人员/内容专家和公众/患者合作伙伴是KU最常参与的群体。KU参与在关注患者参与/共同决策、有公布的协议和被委托的rr中更为常见。结论:在已发表的rrrs中报道KU累及的情况并不常见,而且往往含糊不清。未来的研究应探索KU参与的障碍和促进因素,并在rs中报告。需要关于报告堪萨斯大学参与rrr的指导。
{"title":"Knowledge user involvement is still uncommon in published rapid reviews-a meta-research cross-sectional study.","authors":"Barbara Nussbaumer-Streit, Dominic Ledinger, Christina Kien, Irma Klerings, Emma Persad, Andrea Chapman, Claus Nowak, Arianna Gadinger, Lisa Affengruber, Maureen Smith, Gerald Gartlehner, Ursula Griebler","doi":"10.1017/rsm.2025.10018","DOIUrl":"10.1017/rsm.2025.10018","url":null,"abstract":"<p><strong>Background: </strong>Involving knowledge users (KUs) such as patients, clinicians, or health policymakers is particularly relevant when conducting rapid reviews (RRs), as they should be tailored to decision-makers' needs. However, little is known about how common KU involvement currently is in RRs.</p><p><strong>Objectives: </strong>We wanted to assess the proportion of KU involvement reported in recently published RRs (2021 onwards), which groups of KUs were involved in each phase of the RR process, to what extent, and which factors were associated with KU involvement in RRs.</p><p><strong>Methods: </strong>We conducted a meta-research cross-sectional study. A systematic literature search in Ovid MEDLINE and Epistemonikos in January 2024 identified 2,493 unique records. We dually screened the identified records (partly with assistance from an artificial intelligence (AI)-based application) until we reached the a priori calculated sample size of 104 RRs. We dually extracted data and analyzed it descriptively.</p><p><strong>Results: </strong>The proportion of RRs that reported KU involvement was 19% (95% confidence interval [CI]: 12%-28%). Most often, KUs were involved during the initial preparation of the RR, the systematic searches, and the interpretation and dissemination of results. Researchers/content experts and public/patient partners were the KU groups most often involved. KU involvement was more common in RRs focusing on patient involvement/shared decision-making, having a published protocol, and being commissioned.</p><p><strong>Conclusions: </strong>Reporting KU involvement in published RRs is uncommon and often vague. Future research should explore barriers and facilitators for KU involvement and its reporting in RRs. Guidance regarding reporting on KU involvement in RRs is needed.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 6","pages":"876-899"},"PeriodicalIF":6.1,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Research Synthesis Methods
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1