Pub Date : 2025-09-01Epub Date: 2025-06-16DOI: 10.1017/rsm.2025.24
Zheng Wang, Thomas A Murray, Wenshan Han, Lifeng Lin, Lianne K Siegel, Haitao Chu
Network meta-analysis (NMA) enables simultaneous assessment of multiple treatments by combining both direct and indirect evidence. While NMAs are increasingly important in healthcare decision-making, challenges remain due to limited direct comparisons between treatments. This data sparsity complicates the accurate estimation of correlations among treatments in arm-based NMA (AB-NMA). To address these challenges, we introduce a novel sensitivity analysis tool tailored for AB-NMA. This study pioneers a tipping point analysis within a Bayesian framework, specifically targeting correlation parameters to assess their influence on the robustness of conclusions about relative treatment effects. The analysis explores changes in the conclusion based on whether the 95% credible interval includes the null value (referred to as the interval conclusion) and the magnitude of point estimates. Applying this approach to multiple NMA datasets, including 112 treatment pairs, we identified tipping points in 13 pairs (11.6%) for interval conclusion change and in 29 pairs (25.9%) for magnitude change with a threshold at 15%. These findings underscore potential commonality in tipping points and emphasize the importance of our proposed analysis, especially in networks with sparse direct comparisons or wide credible intervals for correlation estimates. A case study provides a visual illustration and interpretation of the tipping point analysis. We recommend integrating this tipping point analysis as a standard practice in AB-NMA.
{"title":"Tipping point analysis in network meta-analysis.","authors":"Zheng Wang, Thomas A Murray, Wenshan Han, Lifeng Lin, Lianne K Siegel, Haitao Chu","doi":"10.1017/rsm.2025.24","DOIUrl":"10.1017/rsm.2025.24","url":null,"abstract":"<p><p>Network meta-analysis (NMA) enables simultaneous assessment of multiple treatments by combining both direct and indirect evidence. While NMAs are increasingly important in healthcare decision-making, challenges remain due to limited direct comparisons between treatments. This data sparsity complicates the accurate estimation of correlations among treatments in arm-based NMA (AB-NMA). To address these challenges, we introduce a novel sensitivity analysis tool tailored for AB-NMA. This study pioneers a tipping point analysis within a Bayesian framework, specifically targeting correlation parameters to assess their influence on the robustness of conclusions about relative treatment effects. The analysis explores changes in the conclusion based on whether the 95% credible interval includes the null value (referred to as the <i>interval conclusion</i>) and the magnitude of point estimates. Applying this approach to multiple NMA datasets, including 112 treatment pairs, we identified tipping points in 13 pairs (11.6%) for <i>interval conclusion change</i> and in 29 pairs (25.9%) for <i>magnitude change</i> with a threshold at 15%. These findings underscore potential commonality in tipping points and emphasize the importance of our proposed analysis, especially in networks with sparse direct comparisons or wide credible intervals for correlation estimates. A case study provides a visual illustration and interpretation of the tipping point analysis. We recommend integrating this tipping point analysis as a standard practice in AB-NMA.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 5","pages":"797-812"},"PeriodicalIF":6.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-18DOI: 10.1017/rsm.2025.26
Leonhard Held, Felix Hofmann, Samuel Pawel
P-value functions are modern statistical tools that unify effect estimation and hypothesis testing and can provide alternative point and interval estimates compared to standard meta-analysis methods, using any of the many p-value combination procedures available (Xie et al., 2011, JASA). We provide a systematic comparison of different combination procedures, both from a theoretical perspective and through simulation. We show that many prominent p-value combination methods (e.g. Fisher's method) are not invariant to the orientation of the underlying one-sided p-values. Only Edgington's method, a lesser-known combination method based on the sum of p-values, is orientation-invariant and still provides confidence intervals not restricted to be symmetric around the point estimate. Adjustments for heterogeneity can also be made and results from a simulation study indicate that Edgington's method can compete with more standard meta-analytic methods.
p值函数是统一效应估计和假设检验的现代统计工具,与标准的荟萃分析方法相比,可以使用许多p值组合程序中的任何一种,提供替代的点和区间估计(Xie et al., 2011, JASA)。我们从理论和仿真两方面对不同的组合过程进行了系统的比较。我们证明了许多著名的p值组合方法(例如Fisher的方法)对潜在的单侧p值的方向不是不变的。只有Edgington的方法,一种鲜为人知的基于p值和的组合方法,是方向不变的,并且仍然提供不限于围绕点估计对称的置信区间。也可以对异质性进行调整,模拟研究的结果表明,Edgington的方法可以与更标准的元分析方法竞争。
{"title":"A comparison of combined <i>p</i>-value functions for meta-analysis.","authors":"Leonhard Held, Felix Hofmann, Samuel Pawel","doi":"10.1017/rsm.2025.26","DOIUrl":"10.1017/rsm.2025.26","url":null,"abstract":"<p><p><i>P</i>-value functions are modern statistical tools that unify effect estimation and hypothesis testing and can provide alternative point and interval estimates compared to standard meta-analysis methods, using any of the many <i>p</i>-value combination procedures available (Xie et al., 2011, JASA). We provide a systematic comparison of different combination procedures, both from a theoretical perspective and through simulation. We show that many prominent <i>p</i>-value combination methods (e.g. Fisher's method) are not invariant to the orientation of the underlying one-sided <i>p</i>-values. Only Edgington's method, a lesser-known combination method based on the sum of <i>p</i>-values, is orientation-invariant and still provides confidence intervals not restricted to be symmetric around the point estimate. Adjustments for heterogeneity can also be made and results from a simulation study indicate that Edgington's method can compete with more standard meta-analytic methods.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 5","pages":"758-785"},"PeriodicalIF":6.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-06-04DOI: 10.1017/rsm.2025.10011
Will Robinson, Alex Sutton, Clareece Nevill, Nicola Cooper
Graphical displays are often utilised for high-quality reporting of meta-analyses. Previous work has presented augmentations to funnel plots that assess the impact that an additional trial would have on an existing meta-analysis. However, decision-makers, such as the National Institute for Health and Care Excellence in the United Kingdom, assess health technologies based on their cost-effectiveness, as opposed to efficacy alone. Motivated by this fact, this article outlines a novel approach, developed for augmenting funnel plots, based on the ability of an additional trial to change a decision regarding the optimal intervention. The approach is presented for a generalised class of economic decision models, where the clinical effectiveness of the health technology of interest is informed by a meta-analysis, and is illustrated with an example application. The 'decision contours' produced from the proposed methods have various potential uses not only for decision-makers and research funders but also for other researchers, such as meta-analysts and primary researchers designing new studies, as well as those developing health technologies, such as pharmaceutical companies. The relationship between the new approach and existing methods for determining sample size calculations for future trials is also considered.
{"title":"Exploring graphical approaches to assess the impact of an additional trial on a decision model via updated meta-analysis.","authors":"Will Robinson, Alex Sutton, Clareece Nevill, Nicola Cooper","doi":"10.1017/rsm.2025.10011","DOIUrl":"10.1017/rsm.2025.10011","url":null,"abstract":"<p><p>Graphical displays are often utilised for high-quality reporting of meta-analyses. Previous work has presented augmentations to funnel plots that assess the impact that an additional trial would have on an existing meta-analysis. However, decision-makers, such as the National Institute for Health and Care Excellence in the United Kingdom, assess health technologies based on their cost-effectiveness, as opposed to efficacy alone. Motivated by this fact, this article outlines a novel approach, developed for augmenting funnel plots, based on the ability of an additional trial to change a decision regarding the optimal intervention. The approach is presented for a generalised class of economic decision models, where the clinical effectiveness of the health technology of interest is informed by a meta-analysis, and is illustrated with an example application. The 'decision contours' produced from the proposed methods have various potential uses not only for decision-makers and research funders but also for other researchers, such as meta-analysts and primary researchers designing new studies, as well as those developing health technologies, such as pharmaceutical companies. The relationship between the new approach and existing methods for determining sample size calculations for future trials is also considered.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 4","pages":"672-687"},"PeriodicalIF":6.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527514/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-03-24DOI: 10.1017/rsm.2025.15
Adriana López-Pineda, Rauf Nouni-García, Álvaro Carbonell-Soliva, Vicente F Gil-Guillén, Concepción Carratalá-Munuera, Fernando Borrás
With the increasing volume of scientific literature, there is a need to streamline the screening process for titles and abstracts in systematic reviews, reduce the workload for reviewers, and minimize errors. This study validated artificial intelligence (AI) tools, specifically Llama 3 70B via Groq's application programming interface (API) and ChatGPT-4o mini via OpenAI's API, for automating this process in biomedical research. It compared these AI tools with human reviewers using 1,081 articles after duplicate removal. Each AI model was tested in three configurations to assess sensitivity, specificity, predictive values, and likelihood ratios. The Llama 3 model's LLA_2 configuration achieved 77.5% sensitivity and 91.4% specificity, with 90.2% accuracy, a positive predictive value (PPV) of 44.3%, and a negative predictive value (NPV) of 97.9%. The ChatGPT-4o mini model's CHAT_2 configuration showed 56.2% sensitivity, 95.1% specificity, 92.0% accuracy, a PPV of 50.6%, and an NPV of 96.1%. Both models demonstrated strong specificity, with CHAT_2 having higher overall accuracy. Despite these promising results, manual validation remains necessary to address false positives and negatives, ensuring that no important studies are overlooked. This study suggests that AI can significantly enhance efficiency and accuracy in systematic reviews, potentially revolutionizing not only biomedical research but also other fields requiring extensive literature reviews.
{"title":"Validation of large language models (Llama 3 and ChatGPT-4o mini) for title and abstract screening in biomedical systematic reviews.","authors":"Adriana López-Pineda, Rauf Nouni-García, Álvaro Carbonell-Soliva, Vicente F Gil-Guillén, Concepción Carratalá-Munuera, Fernando Borrás","doi":"10.1017/rsm.2025.15","DOIUrl":"10.1017/rsm.2025.15","url":null,"abstract":"<p><p>With the increasing volume of scientific literature, there is a need to streamline the screening process for titles and abstracts in systematic reviews, reduce the workload for reviewers, and minimize errors. This study validated artificial intelligence (AI) tools, specifically Llama 3 70B via Groq's application programming interface (API) and ChatGPT-4o mini via OpenAI's API, for automating this process in biomedical research. It compared these AI tools with human reviewers using 1,081 articles after duplicate removal. Each AI model was tested in three configurations to assess sensitivity, specificity, predictive values, and likelihood ratios. The Llama 3 model's LLA_2 configuration achieved 77.5% sensitivity and 91.4% specificity, with 90.2% accuracy, a positive predictive value (PPV) of 44.3%, and a negative predictive value (NPV) of 97.9%. The ChatGPT-4o mini model's CHAT_2 configuration showed 56.2% sensitivity, 95.1% specificity, 92.0% accuracy, a PPV of 50.6%, and an NPV of 96.1%. Both models demonstrated strong specificity, with CHAT_2 having higher overall accuracy. Despite these promising results, manual validation remains necessary to address false positives and negatives, ensuring that no important studies are overlooked. This study suggests that AI can significantly enhance efficiency and accuracy in systematic reviews, potentially revolutionizing not only biomedical research but also other fields requiring extensive literature reviews.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 4","pages":"620-630"},"PeriodicalIF":6.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12623132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-05-15DOI: 10.1017/rsm.2025.21
Harlan Campbell, Dylan Maciel, Keith Chan, Jeroen P Jansen, Sven Klijn, Kevin Towle, Bill Malcolm, Shannon Cope
The importance of network meta-analysis (NMA) methods for time-to-event (TTE) that do not rely on the proportional hazard (PH) assumption is increasingly recognized in oncology, where clinical trials evaluating new interventions versus standard comparators often violate this assumption. However, existing NMA methods that allow for time-varying treatment effects do not directly leverage individual events and censor times that can be reconstructed from Kaplan-Meier curves, which may be more accurate than discrete hazards. They are also challenging to implement given reparameterizations that rely on discrete hazards. Additionally, two-step methods require assumptions regarding within-study normality and variance. We propose a one-step fully Bayesian parametric individual patient data (IPD)-NMA model that fits TTE data with the exact likelihood and allows for time-varying treatment effects. We define fixed or random effects with the following distributions: Weibull, Gompertz, log-normal, log-logistic, gamma, or generalized gamma distributions. We apply the one-step model to a network of randomized controlled trials (RCTs) evaluating multiple interventions for advanced melanoma and compare results with those obtained with the two-step approach. Additionally, a simulation study was performed to compare the proposed one-step method to the two-step method. The one-step method allows for straightforward model selection among the "standard" distributions, now including gamma and generalized gamma, with treatment effects on either the scale alone or with multivariate treatment effects. Generalized gamma offers flexibility to model U-shaped hazards within a network of RCTs, with accessible interpretation of parameters that simplifies to exponential, Weibull, log-normal, or gamma in special cases.
{"title":"One-step parametric network meta-analysis models using the exact likelihood that allow for time-varying treatment effects.","authors":"Harlan Campbell, Dylan Maciel, Keith Chan, Jeroen P Jansen, Sven Klijn, Kevin Towle, Bill Malcolm, Shannon Cope","doi":"10.1017/rsm.2025.21","DOIUrl":"10.1017/rsm.2025.21","url":null,"abstract":"<p><p>The importance of network meta-analysis (NMA) methods for time-to-event (TTE) that do not rely on the proportional hazard (PH) assumption is increasingly recognized in oncology, where clinical trials evaluating new interventions versus standard comparators often violate this assumption. However, existing NMA methods that allow for time-varying treatment effects do not directly leverage individual events and censor times that can be reconstructed from Kaplan-Meier curves, which may be more accurate than discrete hazards. They are also challenging to implement given reparameterizations that rely on discrete hazards. Additionally, two-step methods require assumptions regarding within-study normality and variance. We propose a one-step fully Bayesian parametric individual patient data (IPD)-NMA model that fits TTE data with the exact likelihood and allows for time-varying treatment effects. We define fixed or random effects with the following distributions: Weibull, Gompertz, log-normal, log-logistic, gamma, or generalized gamma distributions. We apply the one-step model to a network of randomized controlled trials (RCTs) evaluating multiple interventions for advanced melanoma and compare results with those obtained with the two-step approach. Additionally, a simulation study was performed to compare the proposed one-step method to the two-step method. The one-step method allows for straightforward model selection among the \"standard\" distributions, now including gamma and generalized gamma, with treatment effects on either the scale alone or with multivariate treatment effects. Generalized gamma offers flexibility to model U-shaped hazards within a network of RCTs, with accessible interpretation of parameters that simplifies to exponential, Weibull, log-normal, or gamma in special cases.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 4","pages":"650-671"},"PeriodicalIF":6.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527511/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-04-25DOI: 10.1017/rsm.2025.18
Amalia Karahalios, Ian R White, Simon L Turner, Georgia Salanti, G Peter Herbison, Areti Angeliki Veroniki, Adriani Nikolakopoulou, Joanne E McKenzie
Network meta-analysis allows the synthesis of relative effects from several treatments. Two broad approaches are available to synthesize the data: arm-synthesis and contrast-synthesis, with several models that can be fitted within each. Limited evaluations comparing these approaches are available. We re-analyzed 118 networks of interventions with binary outcomes using three contrast-synthesis models (CSM; one fitted in a frequentist framework and two in a Bayesian framework) and two arm-synthesis models (ASM; both fitted in a Bayesian framework). We compared the estimated log odds ratios, their standard errors, ranking measures and the between-trial heterogeneity using the different models and investigated if differences in the results were modified by network characteristics. In general, we observed good agreement with respect to the odds ratios, their standard errors and the ranking metrics between the two Bayesian CSMs. However, differences were observed when comparing the frequentist CSM and the ASMs to each other and to the Bayesian CSMs. The network characteristics that we investigated, which represented the connectedness of the networks and rareness of events, were associated with the differences observed between models, but no single factor was associated with the differences across all of the metrics. In conclusion, we found that different models used to synthesize evidence in a network meta-analysis (NMA) can yield different estimates of odds ratios and standard errors that can impact the final ranking of the treatment options compared.
{"title":"An investigation of the impact of using contrast- and arm-synthesis models for network meta-analysis.","authors":"Amalia Karahalios, Ian R White, Simon L Turner, Georgia Salanti, G Peter Herbison, Areti Angeliki Veroniki, Adriani Nikolakopoulou, Joanne E McKenzie","doi":"10.1017/rsm.2025.18","DOIUrl":"10.1017/rsm.2025.18","url":null,"abstract":"<p><p>Network meta-analysis allows the synthesis of relative effects from several treatments. Two broad approaches are available to synthesize the data: arm-synthesis and contrast-synthesis, with several models that can be fitted within each. Limited evaluations comparing these approaches are available. We re-analyzed 118 networks of interventions with binary outcomes using three contrast-synthesis models (CSM; one fitted in a frequentist framework and two in a Bayesian framework) and two arm-synthesis models (ASM; both fitted in a Bayesian framework). We compared the estimated log odds ratios, their standard errors, ranking measures and the between-trial heterogeneity using the different models and investigated if differences in the results were modified by network characteristics. In general, we observed good agreement with respect to the odds ratios, their standard errors and the ranking metrics between the two Bayesian CSMs. However, differences were observed when comparing the frequentist CSM and the ASMs to each other and to the Bayesian CSMs. The network characteristics that we investigated, which represented the connectedness of the networks and rareness of events, were associated with the differences observed between models, but no single factor was associated with the differences across all of the metrics. In conclusion, we found that different models used to synthesize evidence in a network meta-analysis (NMA) can yield different estimates of odds ratios and standard errors that can impact the final ranking of the treatment options compared.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 4","pages":"631-649"},"PeriodicalIF":6.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-06-09DOI: 10.1017/rsm.2025.10016
Alexander Pachanov, Catharina Muente, Julian Hirt, Dawid Pieper
We developed a geographic search filter for retrieving studies about Germany from PubMed. In this study, we aimed to translate and validate it for use in Embase and MEDLINE(R) ALL via Ovid. Adjustments included aligning PubMed field tags with Ovid's syntax, adding a keyword heading field for both databases, and incorporating a correspondence address field for Embase. To validate the filters, we used systematic reviews (SRs) that included studies about Germany without imposing geographic restrictions on their search strategies. Subsequently, we conducted (i) case studies (CSs), applying the filters to the search strategies of the 17 eligible SRs; and (ii) aggregation studies, combining the SRs' search strategies with the 'OR' operator and applying the filters. In the CSs, the filters demonstrated a median sensitivity of 100% in both databases, with interquartile ranges (IQRs) of 100%-100% in Embase and 93.75%-100% in MEDLINE(R) ALL. Median precision improved from 0.11% (IQR: 0.05%-0.30%) to 1.65% (IQR: 0.78%-3.06%) and from 0.19% (IQR: 0.11%-0.60%) to 5.13% (IQR: 1.77%-6.85%), while the number needed to read (NNR) decreased from 893.40 (IQR: 354.81-2,219.58) to 60.44 (IQR: 33.94-128.97) and from 513.29 (IQR: 167.35-930.99) to 19.50 (IQR: 14.66-59.35) for Embase and MEDLINE(R) ALL, respectively. In the aggregation studies, the overall sensitivities were 98.19% and 97.14%, with NNRs of 83.29 and 33.34 in Embase and MEDLINE(R) ALL, respectively. The new Embase and MEDLINE(R) ALL filters for Ovid reliably retrieve studies about Germany, enhancing search precision. The approach described in our study can support search filter developers in translating filters for various topics and contexts.
{"title":"Translation and validation of a geographic search filter to identify studies about Germany in Embase (Ovid) and MEDLINE(R) ALL (Ovid).","authors":"Alexander Pachanov, Catharina Muente, Julian Hirt, Dawid Pieper","doi":"10.1017/rsm.2025.10016","DOIUrl":"10.1017/rsm.2025.10016","url":null,"abstract":"<p><p>We developed a geographic search filter for retrieving studies about Germany from PubMed. In this study, we aimed to translate and validate it for use in Embase and MEDLINE(R) ALL via Ovid. Adjustments included aligning PubMed field tags with Ovid's syntax, adding a keyword heading field for both databases, and incorporating a correspondence address field for Embase. To validate the filters, we used systematic reviews (SRs) that included studies about Germany without imposing geographic restrictions on their search strategies. Subsequently, we conducted (i) case studies (CSs), applying the filters to the search strategies of the 17 eligible SRs; and (ii) aggregation studies, combining the SRs' search strategies with the 'OR' operator and applying the filters. In the CSs, the filters demonstrated a median sensitivity of 100% in both databases, with interquartile ranges (IQRs) of 100%-100% in Embase and 93.75%-100% in MEDLINE(R) ALL. Median precision improved from 0.11% (IQR: 0.05%-0.30%) to 1.65% (IQR: 0.78%-3.06%) and from 0.19% (IQR: 0.11%-0.60%) to 5.13% (IQR: 1.77%-6.85%), while the number needed to read (NNR) decreased from 893.40 (IQR: 354.81-2,219.58) to 60.44 (IQR: 33.94-128.97) and from 513.29 (IQR: 167.35-930.99) to 19.50 (IQR: 14.66-59.35) for Embase and MEDLINE(R) ALL, respectively. In the aggregation studies, the overall sensitivities were 98.19% and 97.14%, with NNRs of 83.29 and 33.34 in Embase and MEDLINE(R) ALL, respectively. The new Embase and MEDLINE(R) ALL filters for Ovid reliably retrieve studies about Germany, enhancing search precision. The approach described in our study can support search filter developers in translating filters for various topics and contexts.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 4","pages":"688-700"},"PeriodicalIF":6.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527497/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-04-25DOI: 10.1017/rsm.2025.20
Zahra Premji, Chris Cooper
Trials registry records represent a challenge in deduplication compared to deduplicating studies reported in journals and exported from bibliographic databases such as MEDLINE. We demonstrate why this is the case and propose a method to deduplicate registry records from the WHO International Clinical Trials Registry Platform (ICTRP) and ClinicalTrials.gov (CTG) specifically in the reference management tool EndNote (desktop version). We believe that our method is not only more efficient but that it will minimise the risk of registry records being incorrectly removed as duplicates in automated deduplication. The method has seven steps and is detailed in this tutorial as a step-by-step guide.
{"title":"Same, same, but different: A method to harmonise and deduplicate study records from WHO ICTRP and ClinicalTrials.gov prior to screening.","authors":"Zahra Premji, Chris Cooper","doi":"10.1017/rsm.2025.20","DOIUrl":"10.1017/rsm.2025.20","url":null,"abstract":"<p><p>Trials registry records represent a challenge in deduplication compared to deduplicating studies reported in journals and exported from bibliographic databases such as MEDLINE. We demonstrate why this is the case and propose a method to deduplicate registry records from the WHO International Clinical Trials Registry Platform (ICTRP) and ClinicalTrials.gov (CTG) specifically in the reference management tool EndNote (desktop version). We believe that our method is not only more efficient but that it will minimise the risk of registry records being incorrectly removed as duplicates in automated deduplication. The method has seven steps and is detailed in this tutorial as a step-by-step guide.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 4","pages":"587-600"},"PeriodicalIF":6.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527485/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-04-24DOI: 10.1017/rsm.2025.16
Justin Clark, Belinda Barton, Loai Albarqouni, Oyungerel Byambasuren, Tanisha Jowsey, Justin Keogh, Tian Liang, Christian Moro, Hayley O'Neill, Mark Jones
Introduction: With the increasing accessibility of tools such as ChatGPT, Copilot, DeepSeek, Dall-E, and Gemini, generative artificial intelligence (GenAI) has been poised as a potential, research timesaving tool, especially for synthesising evidence. Our objective was to determine whether GenAI can assist with evidence synthesis by assessing its performance using its accuracy, error rates, and time savings compared to the traditional expert-driven approach.
Methods: To systematically review the evidence, we searched five databases on 17 January 2025, synthesised outcomes reporting on the accuracy, error rates, or time taken, and appraised the risk-of-bias using a modified version of QUADAS-2.
Results: We identified 3,071 unique records, 19 of which were included in our review. Most studies had a high or unclear risk-of-bias in Domain 1A: review selection, Domain 2A: GenAI conduct, and Domain 1B: applicability of results. When used for (1) searching GenAI missed 68% to 96% (median = 91%) of studies, (2) screening made incorrect inclusion decisions ranging from 0% to 29% (median = 10%); and incorrect exclusion decisions ranging from 1% to 83% (median = 28%), (3) incorrect data extractions ranging from 4% to 31% (median = 14%), (4) incorrect risk-of-bias assessments ranging from 10% to 56% (median = 27%).
Conclusion: Our review shows that the current evidence does not support GenAI use in evidence synthesis without human involvement or oversight. However, for most tasks other than searching, GenAI may have a role in assisting humans with evidence synthesis.
{"title":"Generative artificial intelligence use in evidence synthesis: A systematic review.","authors":"Justin Clark, Belinda Barton, Loai Albarqouni, Oyungerel Byambasuren, Tanisha Jowsey, Justin Keogh, Tian Liang, Christian Moro, Hayley O'Neill, Mark Jones","doi":"10.1017/rsm.2025.16","DOIUrl":"10.1017/rsm.2025.16","url":null,"abstract":"<p><strong>Introduction: </strong>With the increasing accessibility of tools such as ChatGPT, Copilot, DeepSeek, Dall-E, and Gemini, generative artificial intelligence (GenAI) has been poised as a potential, research timesaving tool, especially for synthesising evidence. Our objective was to determine whether GenAI can assist with evidence synthesis by assessing its performance using its accuracy, error rates, and time savings compared to the traditional expert-driven approach.</p><p><strong>Methods: </strong>To systematically review the evidence, we searched five databases on 17 January 2025, synthesised outcomes reporting on the accuracy, error rates, or time taken, and appraised the risk-of-bias using a modified version of QUADAS-2.</p><p><strong>Results: </strong>We identified 3,071 unique records, 19 of which were included in our review. Most studies had a high or unclear risk-of-bias in Domain 1A: review selection, Domain 2A: GenAI conduct, and Domain 1B: applicability of results. When used for (1) searching GenAI missed 68% to 96% (median = 91%) of studies, (2) screening made incorrect inclusion decisions ranging from 0% to 29% (median = 10%); and incorrect exclusion decisions ranging from 1% to 83% (median = 28%), (3) incorrect data extractions ranging from 4% to 31% (median = 14%), (4) incorrect risk-of-bias assessments ranging from 10% to 56% (median = 27%).</p><p><strong>Conclusion: </strong>Our review shows that the current evidence does not support GenAI use in evidence synthesis without human involvement or oversight. However, for most tasks other than searching, GenAI may have a role in assisting humans with evidence synthesis.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 4","pages":"601-619"},"PeriodicalIF":6.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-04-24DOI: 10.1017/rsm.2025.19
Xiangji Ying, Konstantinos I Bougioukas, Dawid Pieper, Evan Mayo-Wilson
When conducting overviews of reviews, investigators must measure and describe the extent to which included systematic reviews (SRs) contain the same primary studies. The corrected covered area (CCA) quantifies overlap by counting primary studies included across a set of SRs. In this article, we introduce a modification to the CCA, the weighted CCA (wCCA), which accounts for differences in information contributed by primary studies. The wCCA adjusts the original CCA by weighting studies based on the square roots of their sample sizes. By weighting primary studies according to their precision, wCCA provides a useful and complementary representation of overlap in evidence syntheses .
{"title":"Weighted corrected covered area (wCCA): A measure of informational overlap among reviews.","authors":"Xiangji Ying, Konstantinos I Bougioukas, Dawid Pieper, Evan Mayo-Wilson","doi":"10.1017/rsm.2025.19","DOIUrl":"10.1017/rsm.2025.19","url":null,"abstract":"<p><p>When conducting overviews of reviews, investigators must measure and describe the extent to which included systematic reviews (SRs) contain the same primary studies. The corrected covered area (CCA) quantifies overlap by counting primary studies included across a set of SRs. In this article, we introduce a modification to the CCA, the weighted CCA (wCCA), which accounts for differences in information contributed by primary studies. The wCCA adjusts the original CCA by weighting studies based on the square roots of their sample sizes. By weighting primary studies according to their precision, wCCA provides a useful and complementary representation of overlap in evidence syntheses .</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 4","pages":"701-708"},"PeriodicalIF":6.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}