Research Synthesis Methods最新文献_第5页

The use of fixed study main effects in arm-based network meta-analysis 在基于臂的网络荟萃分析中使用固定研究主效应。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-05-09 DOI: 10.1002/jrsm.1721

Hans-Peter Piepho, Laurence V. Madden, Emlyn R. Williams

Methods of network meta-analysis (NMA) can be classified as arm-based and contrast-based approaches. There are several arm-based approaches, and some of these have been criticized because they recover inter-study information and hence do not obey the principle of concurrent control. Here, we point out that recovery of inter-study information in arm-based NMA can be prevented by fitting a fixed main effect for studies. Advantages of arm-based NMA are discussed.

网络荟萃分析（NMA）方法可分为以臂为基础的方法和以对比为基础的方法。基于研究臂的方法有多种，其中一些方法因其恢复研究间信息而不符合并发控制原则而受到批评。在此，我们指出，在基于研究臂的 NMA 中，可以通过拟合研究的固定主效应来防止研究间信息的恢复。我们还讨论了基于研究臂的 NMA 的优点。

引用次数: 0

Advancing unanchored simulated treatment comparisons: A novel implementation and simulation study 推进非锚定模拟治疗比较：新颖的实施和模拟研究

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-04-08 DOI: 10.1002/jrsm.1718

Shijie Ren, Sa Ren, Nicky J. Welton, Mark Strong

Population-adjusted indirect comparisons, developed in the 2010s, enable comparisons between two treatments in different studies by balancing patient characteristics in the case where individual patient-level data (IPD) are available for only one study. Health technology assessment (HTA) bodies increasingly rely on these methods to inform funding decisions, typically using unanchored indirect comparisons (i.e., without a common comparator), due to the need to evaluate comparative efficacy and safety for single-arm trials. Unanchored matching-adjusted indirect comparison (MAIC) and unanchored simulated treatment comparison (STC) are currently the only two approaches available for population-adjusted indirect comparisons based on single-arm trials. However, there is a notable underutilisation of unanchored STC in HTA, largely due to a lack of understanding of its implementation. We therefore develop a novel way to implement unanchored STC by incorporating standardisation/marginalisation and the NORmal To Anything (NORTA) algorithm for sampling covariates. This methodology aims to derive a suitable marginal treatment effect without aggregation bias for HTA evaluations. We use a non-parametric bootstrap and propose separately calculating the standard error for the IPD study and the comparator study to ensure the appropriate quantification of the uncertainty associated with the estimated treatment effect. The performance of our proposed unanchored STC approach is evaluated through a comprehensive simulation study focused on binary outcomes. Our findings demonstrate that the proposed approach is asymptotically unbiased. We argue that unanchored STC should be considered when conducting unanchored indirect comparisons with single-arm studies, presenting a robust approach for HTA decision-making.

人口调整间接比较是在 2010 年代发展起来的，在只有一项研究可获得患者个体水平数据（IPD）的情况下，通过平衡患者特征，对不同研究中的两种治疗方法进行比较。由于需要对单臂试验的疗效和安全性进行比较评估，卫生技术评估（HTA）机构越来越多地依赖这些方法来为拨款决策提供信息，通常使用非锚定间接比较（即没有共同的比较对象）。非锚定匹配调整间接比较（MAIC）和非锚定模拟治疗比较（STC）是目前仅有的两种基于单臂试验的人群调整间接比较方法。然而，非锚定 STC 在 HTA 中的使用率明显不足，这主要是由于对其实施缺乏了解。因此，我们开发了一种实施非锚定 STC 的新方法，将标准化/边际化和 NORmal To Anything（NORTA）算法结合起来，用于抽取协变量。该方法旨在得出合适的边际治疗效果，且不存在 HTA 评估的聚集偏差。我们使用非参数自举法，并建议分别计算 IPD 研究和参照研究的标准误差，以确保适当量化与估计治疗效果相关的不确定性。我们提出的非锚定 STC 方法的性能通过一项以二元结果为重点的综合模拟研究进行了评估。我们的研究结果表明，所提出的方法在渐近上是无偏的。我们认为，在对单臂研究进行非锚定间接比较时，应考虑非锚定 STC，从而为 HTA 决策提供一种稳健的方法。

{"title":"Advancing unanchored simulated treatment comparisons: A novel implementation and simulation study","authors":"Shijie Ren, Sa Ren, Nicky J. Welton, Mark Strong","doi":"10.1002/jrsm.1718","DOIUrl":"10.1002/jrsm.1718","url":null,"abstract":"<p>Population-adjusted indirect comparisons, developed in the 2010s, enable comparisons between two treatments in different studies by balancing patient characteristics in the case where individual patient-level data (IPD) are available for only one study. Health technology assessment (HTA) bodies increasingly rely on these methods to inform funding decisions, typically using unanchored indirect comparisons (i.e., without a common comparator), due to the need to evaluate comparative efficacy and safety for single-arm trials. Unanchored matching-adjusted indirect comparison (MAIC) and unanchored simulated treatment comparison (STC) are currently the only two approaches available for population-adjusted indirect comparisons based on single-arm trials. However, there is a notable underutilisation of unanchored STC in HTA, largely due to a lack of understanding of its implementation. We therefore develop a novel way to implement unanchored STC by incorporating standardisation/marginalisation and the NORmal To Anything (NORTA) algorithm for sampling covariates. This methodology aims to derive a suitable marginal treatment effect without aggregation bias for HTA evaluations. We use a non-parametric bootstrap and propose separately calculating the standard error for the IPD study and the comparator study to ensure the appropriate quantification of the uncertainty associated with the estimated treatment effect. The performance of our proposed unanchored STC approach is evaluated through a comprehensive simulation study focused on binary outcomes. Our findings demonstrate that the proposed approach is asymptotically unbiased. We argue that unanchored STC should be considered when conducting unanchored indirect comparisons with single-arm studies, presenting a robust approach for HTA decision-making.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"657-670"},"PeriodicalIF":5.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1718","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140579468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining randomized and non-randomized data to predict heterogeneous effects of competing treatments 结合随机和非随机数据，预测竞争疗法的异质性效果。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-03-19 DOI: 10.1002/jrsm.1717

Konstantina Chalkou, Tasnim Hamza, Pascal Benkert, Jens Kuhle, Chiara Zecca, Gabrielle Simoneau, Fabio Pellegrini, Andrea Manca, Matthias Egger, Georgia Salanti

Some patients benefit from a treatment while others may do so less or do not benefit at all. We have previously developed a two-stage network meta-regression prediction model that synthesized randomized trials and evaluates how treatment effects vary across patient characteristics. In this article, we extended this model to combine different sources of types in different formats: aggregate data (AD) and individual participant data (IPD) from randomized and non-randomized evidence. In the first stage, a prognostic model is developed to predict the baseline risk of the outcome using a large cohort study. In the second stage, we recalibrated this prognostic model to improve our predictions for patients enrolled in randomized trials. In the third stage, we used the baseline risk as effect modifier in a network meta-regression model combining AD, IPD randomized clinical trial to estimate heterogeneous treatment effects. We illustrated the approach in the re-analysis of a network of studies comparing three drugs for relapsing–remitting multiple sclerosis. Several patient characteristics influence the baseline risk of relapse, which in turn modifies the effect of the drugs. The proposed model makes personalized predictions for health outcomes under several treatment options and encompasses all relevant randomized and non-randomized evidence.

一些患者能从治疗中获益，而另一些患者可能获益较少或根本无法获益。我们之前开发了一个两阶段网络元回归预测模型，该模型综合了随机试验并评估了不同患者特征下的治疗效果差异。在本文中，我们对该模型进行了扩展，以结合不同格式的不同类型来源：来自随机和非随机证据的总体数据（AD）和个体参与者数据（IPD）。在第一阶段，我们开发了一个预后模型，利用大型队列研究预测结果的基线风险。在第二阶段，我们对这一预后模型进行了重新校准，以改进我们对随机试验入组患者的预测。在第三阶段，我们在结合 AD、IPD 随机临床试验的网络元回归模型中使用基线风险作为效应修饰符，以估计异质性治疗效果。我们在重新分析比较三种治疗复发缓解型多发性硬化症药物的网络研究中说明了这种方法。患者的一些特征会影响复发的基线风险，进而改变药物的效果。所提出的模型对几种治疗方案下的健康结果进行了个性化预测，并涵盖了所有相关的随机和非随机证据。

{"title":"Combining randomized and non-randomized data to predict heterogeneous effects of competing treatments","authors":"Konstantina Chalkou, Tasnim Hamza, Pascal Benkert, Jens Kuhle, Chiara Zecca, Gabrielle Simoneau, Fabio Pellegrini, Andrea Manca, Matthias Egger, Georgia Salanti","doi":"10.1002/jrsm.1717","DOIUrl":"10.1002/jrsm.1717","url":null,"abstract":"<p>Some patients benefit from a treatment while others may do so less or do not benefit at all. We have previously developed a two-stage network meta-regression prediction model that synthesized randomized trials and evaluates how treatment effects vary across patient characteristics. In this article, we extended this model to combine different sources of types in different formats: aggregate data (AD) and individual participant data (IPD) from randomized and non-randomized evidence. In the first stage, a prognostic model is developed to predict the baseline risk of the outcome using a large cohort study. In the second stage, we recalibrated this prognostic model to improve our predictions for patients enrolled in randomized trials. In the third stage, we used the baseline risk as effect modifier in a network meta-regression model combining AD, IPD randomized clinical trial to estimate heterogeneous treatment effects. We illustrated the approach in the re-analysis of a network of studies comparing three drugs for relapsing–remitting multiple sclerosis. Several patient characteristics influence the baseline risk of relapse, which in turn modifies the effect of the drugs. The proposed model makes personalized predictions for health outcomes under several treatment options and encompasses all relevant randomized and non-randomized evidence.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"641-656"},"PeriodicalIF":5.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1717","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140157195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development of a search filter to retrieve reports of interrupted time series studies from MEDLINE and PubMed 开发搜索过滤器，从 MEDLINE 和 PubMed 中检索中断时间序列研究报告。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-03-17 DOI: 10.1002/jrsm.1716

Phi-Yen Nguyen, Joanne E. McKenzie, Simon L. Turner, Matthew J. Page, Steve McDonald

Background

Interrupted time series (ITS) studies contribute importantly to systematic reviews of population-level interventions. We aimed to develop and validate search filters to retrieve ITS studies in MEDLINE and PubMed.

Methods

A total of 1017 known ITS studies (published 2013–2017) were analysed using text mining to generate candidate terms. A control set of 1398 time-series studies were used to select differentiating terms. Various combinations of candidate terms were iteratively tested to generate three search filters. An independent set of 700 ITS studies was used to validate the filters' sensitivities. The filters were test-run in Ovid MEDLINE and the records randomly screened for ITS studies to determine their precision. Finally, all MEDLINE filters were translated to PubMed format and their sensitivities in PubMed were estimated.

Results

Three search filters were created in MEDLINE: a precision-maximising filter with high precision (78%; 95% CI 74%–82%) but moderate sensitivity (63%; 59%–66%), most appropriate when there are limited resources to screen studies; a sensitivity-and-precision-maximising filter with higher sensitivity (81%; 77%–83%) but lower precision (32%; 28%–36%), providing a balance between expediency and comprehensiveness; and a sensitivity-maximising filter with high sensitivity (88%; 85%–90%) but likely very low precision, useful when combined with specific content terms. Similar sensitivity estimates were found for PubMed versions.

Conclusion

Our filters strike different balances between comprehensiveness and screening workload and suit different research needs. Retrieval of ITS studies would be improved if authors identified the ITS design in the titles.

背景：间断时间序列（ITS）研究对人群干预的系统性综述有重要贡献。我们旨在开发并验证检索过滤器，以便在 MEDLINE 和 PubMed 中检索 ITS 研究：我们使用文本挖掘法分析了总共 1017 项已知的 ITS 研究（发表于 2013-2017 年），以生成候选术语。对照组包括 1398 项时间序列研究，用于选择差异化术语。对候选术语的各种组合进行了反复测试，以生成三个搜索过滤器。一组独立的 700 项 ITS 研究用于验证过滤器的灵敏度。筛选器在 Ovid MEDLINE 中试运行，并随机筛选 ITS 研究记录，以确定其精确度。最后，将所有 MEDLINE 筛选器翻译成 PubMed 格式，并估算其在 PubMed 中的灵敏度：结果：在 MEDLINE 中创建了三种搜索过滤器：精确度最大化过滤器，精确度高（78%；95% CI 74%-82%），但灵敏度适中（63%；59%-66%），在筛选研究的资源有限时最合适；灵敏度和精确度最大化过滤器，灵敏度较高（81%；灵敏度和精确度最大化过滤器具有较高的灵敏度（81%；77%-83%），但精确度较低（32%；28%-36%），可在便捷性和全面性之间取得平衡；灵敏度最大化过滤器具有较高的灵敏度（88%；85%-90%），但精确度可能很低，在与特定内容术语相结合时非常有用。PubMed 版本也有类似的灵敏度估计值：我们的过滤器在全面性和筛选工作量之间取得了不同的平衡，适合不同的研究需求。如果作者能在标题中标明 ITS 设计，ITS 研究的检索将得到改善。

{"title":"Development of a search filter to retrieve reports of interrupted time series studies from MEDLINE and PubMed","authors":"Phi-Yen Nguyen, Joanne E. McKenzie, Simon L. Turner, Matthew J. Page, Steve McDonald","doi":"10.1002/jrsm.1716","DOIUrl":"10.1002/jrsm.1716","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Interrupted time series (ITS) studies contribute importantly to systematic reviews of population-level interventions. We aimed to develop and validate search filters to retrieve ITS studies in MEDLINE and PubMed.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>A total of 1017 known ITS studies (published 2013–2017) were analysed using text mining to generate candidate terms. A control set of 1398 time-series studies were used to select differentiating terms. Various combinations of candidate terms were iteratively tested to generate three search filters. An independent set of 700 ITS studies was used to validate the filters' sensitivities. The filters were test-run in Ovid MEDLINE and the records randomly screened for ITS studies to determine their precision. Finally, all MEDLINE filters were translated to PubMed format and their sensitivities in PubMed were estimated.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>Three search filters were created in MEDLINE: a <i>precision-maximising</i> filter with high precision (78%; 95% CI 74%–82%) but moderate sensitivity (63%; 59%–66%), most appropriate when there are limited resources to screen studies; a <i>sensitivity-and-precision-maximising</i> filter with higher sensitivity (81%; 77%–83%) but lower precision (32%; 28%–36%), providing a balance between expediency and comprehensiveness; and a s<i>ensitivity-maximising</i> filter with high sensitivity (88%; 85%–90%) but likely very low precision, useful when combined with specific content terms. Similar sensitivity estimates were found for PubMed versions.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Conclusion</h3>\u0000 \u0000 <p>Our filters strike different balances between comprehensiveness and screening workload and suit different research needs. Retrieval of ITS studies would be improved if authors identified the ITS design in the titles.</p>\u0000 </section>\u0000 </div>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"627-640"},"PeriodicalIF":5.0,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1716","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140142443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages 大型语言模型能否在系统综述中取代人类？评估 GPT-4 从多语种同行评审和灰色文献中筛选和提取数据的功效。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-03-14 DOI: 10.1002/jrsm.1715

Qusai Khraisha, Sophie Put, Johanna Kappenberg, Azza Warraitch, Kristin Hadfield

Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained Transformer (GPT)-4, the biggest LLM so far. This pre-registered study uses a “human-out-of-the-loop” approach to evaluate GPT-4's capability in title/abstract screening, full-text review and data extraction across various literature types and languages. Although GPT-4 had accuracy on par with human performance in some tasks, results were skewed by chance agreement and dataset imbalance. Adjusting for these caused performance scores to drop across all stages: for data extraction, performance was moderate, and for screening, it ranged from none in highly balanced literature datasets (~1:1) to moderate in those datasets where the ratio of inclusion to exclusion in studies was imbalanced (~1:3). When screening full-text literature using highly reliable prompts, GPT-4's performance was more robust, reaching “human-like” levels. Although our findings indicate that, currently, substantial caution should be exercised if LLMs are being used to conduct systematic reviews, they also offer preliminary evidence that, for certain review tasks delivered under specific conditions, LLMs can rival human performance.

系统性综述对指导实践、研究和政策至关重要，但其速度往往很慢，而且需要大量人力。大型语言模型（LLMs）可以加快系统性综述的速度并使其自动化，但它们在此类任务中的表现还有待于与人类进行全面评估，而且迄今为止还没有任何研究对最大的 LLM--生成式预训练转换器（GPT）-4 进行过测试。这项预先注册的研究采用了 "人出回路 "的方法，评估 GPT-4 在标题/摘要筛选、全文审阅以及跨各种文献类型和语言的数据提取方面的能力。虽然 GPT-4 在某些任务中的准确性与人类表现相当，但偶然的一致和数据集的不平衡使结果出现偏差。对这些因素进行调整后，所有阶段的性能得分都有所下降：在数据提取方面，性能为中等；在筛选方面，从高度平衡的文献数据集（约为 1:1）中的零分到研究纳入与排除比例失调的数据集（约为 1:3）中的中等分不等。在使用高度可靠的提示筛选全文文献时，GPT-4 的性能更加稳定，达到了 "类似人类 "的水平。尽管我们的研究结果表明，目前在使用 LLM 进行系统性综述时应非常谨慎，但这些研究结果也提供了初步证据，表明对于在特定条件下执行的某些综述任务，LLM 的表现可以与人类相媲美。

{"title":"Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages","authors":"Qusai Khraisha, Sophie Put, Johanna Kappenberg, Azza Warraitch, Kristin Hadfield","doi":"10.1002/jrsm.1715","DOIUrl":"10.1002/jrsm.1715","url":null,"abstract":"<p>Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained Transformer (GPT)-4, the biggest LLM so far. This pre-registered study uses a “human-out-of-the-loop” approach to evaluate GPT-4's capability in title/abstract screening, full-text review and data extraction across various literature types and languages. Although GPT-4 had accuracy on par with human performance in some tasks, results were skewed by chance agreement and dataset imbalance. Adjusting for these caused performance scores to drop across all stages: for data extraction, performance was moderate, and for screening, it ranged from none in highly balanced literature datasets (~1:1) to moderate in those datasets where the ratio of inclusion to exclusion in studies was imbalanced (~1:3). When screening full-text literature using highly reliable prompts, GPT-4's performance was more robust, reaching “human-like” levels. Although our findings indicate that, currently, substantial caution should be exercised if LLMs are being used to conduct systematic reviews, they also offer preliminary evidence that, for certain review tasks delivered under specific conditions, LLMs can rival human performance.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"616-626"},"PeriodicalIF":5.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1715","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140130308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Easy-Amanida: An R Shiny application for the meta-analysis of aggregate results in clinical metabolomics using Amanida and Webchem Easy-Amanida：利用 Amanida 和 Webchem 对临床代谢组学的总体结果进行荟萃分析的 R Shiny 应用程序。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-03-13 DOI: 10.1002/jrsm.1713

Maria Llambrich, Pau Satorra, Eudald Correig, Josep Gumà, Jesús Brezmes, Cristian Tebé, Raquel Cumeras

Meta-analysis is a useful tool in clinical research, as it combines the results of multiple clinical studies to improve precision when answering a particular scientific question. While there has been a substantial increase in publications using meta-analysis in various clinical research topics, the number of published meta-analyses in metabolomics is significantly lower compared to other omics disciplines. Metabolomics is the study of small chemical compounds in living organisms, which provides important insights into an organism's phenotype. However, the wide variety of compounds and the different experimental methods used in metabolomics make it challenging to perform a thorough meta-analysis. Additionally, there is a lack of consensus on reporting statistical estimates, and the high number of compound naming synonyms further complicates the process. Easy-Amanida is a new tool that combines two R packages, “amanida” and “webchem”, to enable meta-analysis of aggregate statistical data, like p-value and fold-change, while ensuring the compounds naming harmonization. The Easy-Amanida app is implemented in Shiny, an R package add-on for interactive web apps, and provides a workflow to optimize the naming combination. This article describes all the steps to perform the meta-analysis using Easy-Amanida, including an illustrative example for interpreting the results. The use of aggregate statistics metrics extends the use of Easy-Amanida beyond the metabolomics field.

在临床研究中，荟萃分析是一种有用的工具，因为它能将多项临床研究的结果结合起来，从而在回答特定科学问题时提高精确度。虽然在各种临床研究课题中使用荟萃分析的论文数量大幅增加，但与其他 omics 学科相比，代谢组学领域发表的荟萃分析论文数量明显较少。代谢组学研究生物体内的小分子化合物，为了解生物体的表型提供重要依据。然而，代谢组学中使用的化合物种类繁多，实验方法各不相同，因此要进行全面的荟萃分析具有挑战性。此外，在报告统计估计值方面也缺乏共识，而化合物命名同义词的大量存在又使这一过程变得更加复杂。Easy-Amanida 是一款新工具，它结合了两个 R 软件包 "amanida "和 "webchem"，能够对 P 值和折叠变化等总体统计数据进行荟萃分析，同时确保化合物命名的统一。Easy-Amanida 应用程序是在 Shiny（一种用于交互式网络应用程序的 R 包插件）中实现的，并提供了优化命名组合的工作流程。本文介绍了使用 Easy-Amanida 进行荟萃分析的所有步骤，包括一个解释结果的示例。汇总统计指标的使用将 Easy-Amanida 的用途扩展到代谢组学领域之外。

{"title":"Easy-Amanida: An R Shiny application for the meta-analysis of aggregate results in clinical metabolomics using Amanida and Webchem","authors":"Maria Llambrich, Pau Satorra, Eudald Correig, Josep Gumà, Jesús Brezmes, Cristian Tebé, Raquel Cumeras","doi":"10.1002/jrsm.1713","DOIUrl":"10.1002/jrsm.1713","url":null,"abstract":"<p>Meta-analysis is a useful tool in clinical research, as it combines the results of multiple clinical studies to improve precision when answering a particular scientific question. While there has been a substantial increase in publications using meta-analysis in various clinical research topics, the number of published meta-analyses in metabolomics is significantly lower compared to other omics disciplines. Metabolomics is the study of small chemical compounds in living organisms, which provides important insights into an organism's phenotype. However, the wide variety of compounds and the different experimental methods used in metabolomics make it challenging to perform a thorough meta-analysis. Additionally, there is a lack of consensus on reporting statistical estimates, and the high number of compound naming synonyms further complicates the process. Easy-Amanida is a new tool that combines two R packages, “amanida” and “webchem”, to enable meta-analysis of aggregate statistical data, like <i>p</i>-value and fold-change, while ensuring the compounds naming harmonization. The Easy-Amanida app is implemented in Shiny, an R package add-on for interactive web apps, and provides a workflow to optimize the naming combination. This article describes all the steps to perform the meta-analysis using Easy-Amanida, including an illustrative example for interpreting the results. The use of aggregate statistics metrics extends the use of Easy-Amanida beyond the metabolomics field.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"687-699"},"PeriodicalIF":5.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1713","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140118334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LFK index does not reliably detect small-study effects in meta-analysis: A simulation study LFK 指数不能可靠地检测荟萃分析中的小研究效应：模拟研究

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-03-11 DOI: 10.1002/jrsm.1714

Guido Schwarzer, Gerta Rücker, Cristina Semaca

The LFK index has been promoted as an improved method to detect bias in meta-analysis. Putatively, its performance does not depend on the number of studies in the meta-analysis. We conducted a simulation study, comparing the LFK index test to three standard tests for funnel plot asymmetry in settings with smaller or larger group sample sizes. In general, false positive rates of the LFK index test markedly depended on the number and size of studies as well as the between-study heterogeneity with values between 0% and almost 30%. Egger's test adhered well to the pre-specified significance level of 5% under homogeneity, but was too liberal (smaller groups) or conservative (larger groups) under heterogeneity. The rank test was too conservative for most simulation scenarios. The Thompson–Sharp test was too conservative under homogeneity, but adhered well to the significance level in case of heterogeneity. The true positive rate of the LFK index test was only larger compared with classic tests if the false positive rate was inflated. The power of classic tests was similar or larger than the LFK index test if the false positive rate of the LFK index test was used as significance level for the classic tests. Under ideal conditions, the false positive rate of the LFK index test markedly and unpredictably depends on the number and sample size of studies as well as the extent of between-study heterogeneity. The LFK index test in its current implementation should not be used to assess funnel plot asymmetry in meta-analysis.

LFK 指数被认为是检测荟萃分析偏差的一种改进方法。据推测，它的性能并不取决于荟萃分析中的研究数量。我们进行了一项模拟研究，比较了 LFK 指数检验与三种漏斗图不对称标准检验在较小或较大组样本量情况下的效果。一般来说，LFK 指数检验的假阳性率明显取决于研究的数量和规模以及研究间的异质性，数值在 0% 到近 30% 之间。在同质性条件下，Egger 检验能很好地遵守 5%的预设显著性水平，但在异质性条件下，Egger 检验过于宽松（较小的研究组）或保守（较大的研究组）。在大多数模拟情况下，秩检验过于保守。汤普森-夏普（Thompson-Sharp）检验在同质性条件下过于保守，但在异质性条件下很好地遵守了显著性水平。只有在假阳性率被夸大的情况下，LFK 指数检验的真阳性率才会比传统检验大。如果将 LFK 指数检验的假阳性率作为经典检验的显著性水平，经典检验的功率与 LFK 指数检验相似或更大。在理想条件下，LFK 指数检验的假阳性率明显且不可预测地取决于研究的数量和样本大小以及研究间异质性的程度。目前实施的 LFK 指数检验不应被用来评估荟萃分析中漏斗图的不对称性。

{"title":"LFK index does not reliably detect small-study effects in meta-analysis: A simulation study","authors":"Guido Schwarzer, Gerta Rücker, Cristina Semaca","doi":"10.1002/jrsm.1714","DOIUrl":"10.1002/jrsm.1714","url":null,"abstract":"<p>The <i>LFK</i> index has been promoted as an improved method to detect bias in meta-analysis. Putatively, its performance does not depend on the number of studies in the meta-analysis. We conducted a simulation study, comparing the <i>LFK</i> index test to three standard tests for funnel plot asymmetry in settings with smaller or larger group sample sizes. In general, false positive rates of the <i>LFK</i> index test markedly depended on the number and size of studies as well as the between-study heterogeneity with values between 0% and almost 30%. Egger's test adhered well to the pre-specified significance level of 5% under homogeneity, but was too liberal (smaller groups) or conservative (larger groups) under heterogeneity. The rank test was too conservative for most simulation scenarios. The Thompson–Sharp test was too conservative under homogeneity, but adhered well to the significance level in case of heterogeneity. The true positive rate of the <i>LFK</i> index test was only larger compared with classic tests if the false positive rate was inflated. The power of classic tests was similar or larger than the <i>LFK</i> index test if the false positive rate of the <i>LFK</i> index test was used as significance level for the classic tests. Under ideal conditions, the false positive rate of the <i>LFK</i> index test markedly and unpredictably depends on the number and sample size of studies as well as the extent of between-study heterogeneity. The <i>LFK</i> index test in its current implementation should not be used to assess funnel plot asymmetry in meta-analysis.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"603-615"},"PeriodicalIF":5.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1714","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140100608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction to “Network Meta-Interpolation: Effect modification adjustment in network meta-analysis using subgroup analyses” 更正 "网络元插值：使用亚组分析对网络荟萃分析中的效应修正进行调整"。

IF 9.8 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-03-03 DOI: 10.1002/jrsm.1712

Ofir Harari, Mohsen Soltanifar, Joseph C. Cappelleri, Andre Verhoek, Mario Ouwens, Caitlin Daly, and Bart Heeg (2023) Network Meta-Interpolation: Effect modification adjustment in network meta-analysis using subgroup analyses. Research Synthesis Methods, 14: 211–233.

This is also the version that appears in the R code provided to the readers as part of the Supporting Information.

We apologize for these errors.

Ofir Harari、Mohsen Soltanifar、Joseph C. Cappelleri、Andre Verhoek、Mario Ouwens、Caitlin Daly 和 Bart Heeg (2023) Network Meta-Interpolation：使用亚组分析对网络荟萃分析中的效应修正进行调整。Research Synthesis Methods, 14: 211-233.This is also the version that appears in the R code provided to the readers as part of the Supporting Information.我们对这些错误表示歉意。

引用次数: 0

Data extraction for evidence synthesis using a large language model: A proof-of-concept study 使用大型语言模型进行证据综合的数据提取：概念验证研究。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-03-03 DOI: 10.1002/jrsm.1710

Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Ian Thomas, Shannon Kugley, Karen Crotty, Meera Viswanathan, Barbara Nussbaumer-Streit, Graham Booth, Nathaniel Erskine, Amanda Konet, Robert Chew

Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test–retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (n = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.

数据提取是证据合成的一个关键部分，但也是劳动密集型和容易出错的部分。迄今为止，利用机器学习提高数据提取过程效率的努力尚未达到足够的准确性和可用性。随着大型语言模型（LLM）的发布，为提高证据合成数据提取的效率和准确性提供了新的可能性。这项概念验证研究的目的是评估 LLM（克劳德 2）与系统综述中使用的人工数据提取相比，在从已发表的研究中提取数据元素方面的性能。我们的分析采用了方便抽样的方式，抽样对象是 10 篇英文公开发表的随机对照试验出版物，这些出版物都包含在一篇系统综述中。我们选取了 16 种不同类型的数据，难度各不相同（10 项研究共包含 160 个数据元素）。我们使用 Claude 2 的浏览器版本上传了每份出版物的可移植文档格式，然后对每个数据元素进行了模型提示。在 160 个数据元素中，Claude 2 的总体准确率为 96.3%，测试-重复测试的可靠性很高（复制 1：96.9%；复制 2：95.0%）。总体而言，克劳德 2 号在 160 个数据项中出现了 6 次错误。最常见的错误（n = 4）是遗漏数据项。重要的是，Claude 2 的易用性很高；它不需要专业技术知识或标注的训练数据就能有效运行（即零射击学习）。根据我们的概念验证研究结果，利用 LLMs 有可能大大提高证据综合数据提取的效率和准确性。

{"title":"Data extraction for evidence synthesis using a large language model: A proof-of-concept study","authors":"Gerald Gartlehner, Leila Kahwati, Rainer Hilscher, Ian Thomas, Shannon Kugley, Karen Crotty, Meera Viswanathan, Barbara Nussbaumer-Streit, Graham Booth, Nathaniel Erskine, Amanda Konet, Robert Chew","doi":"10.1002/jrsm.1710","DOIUrl":"10.1002/jrsm.1710","url":null,"abstract":"<p>Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test–retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (<i>n</i> = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"576-589"},"PeriodicalIF":5.0,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1710","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140020433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating the extent of selective reporting: An application to economics 估计选择性报告的程度：在经济学中的应用。

IF 5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods

Pub Date : 2024-02-21 DOI: 10.1002/jrsm.1711

Stephan B. Bruns, Teshome K. Deressa, T. D. Stanley, Chris Doucouliagos, John P. A. Ioannidis

Using a sample of 70,399 published p-values from 192 meta-analyses, we empirically estimate the counterfactual distribution of p-values in the absence of any biases. Comparing observed p-values with counterfactually expected p-values allows us to estimate how many p-values are published as being statistically significant when they should have been published as non-significant. We estimate the extent of selectively reported p-values to range between 57.7% and 71.9% of the significant p-values. The counterfactual p-value distribution also allows us to assess shifts of p-values along the entire distribution of published p-values, revealing that particularly very small p-values (p < 0.001) are unexpectedly abundant in the published literature. Subsample analysis suggests that the extent of selective reporting is reduced in research fields that use experimental designs, analyze microeconomics research questions, and have at least some adequately powered studies.

我们使用 192 项元分析中 70,399 个已发表 p 值的样本，根据经验估算了在没有任何偏差的情况下 p 值的反事实分布。将观察到的 p 值与反事实预期的 p 值进行比较，我们就能估算出有多少 p 值在本应作为非显著性发表的情况下却被作为具有统计学意义的 p 值发表了。我们估计选择性报告的 p 值占显著 p 值的 57.7% 到 71.9%。通过反事实 p 值分布，我们还可以评估 p 值在整个已公布 p 值分布中的移动情况。

{"title":"Estimating the extent of selective reporting: An application to economics","authors":"Stephan B. Bruns, Teshome K. Deressa, T. D. Stanley, Chris Doucouliagos, John P. A. Ioannidis","doi":"10.1002/jrsm.1711","DOIUrl":"10.1002/jrsm.1711","url":null,"abstract":"<p>Using a sample of 70,399 published <i>p</i>-values from 192 meta-analyses, we empirically estimate the counterfactual distribution of <i>p</i>-values in the absence of any biases. Comparing observed <i>p</i>-values with counterfactually expected <i>p</i>-values allows us to estimate how many <i>p</i>-values are published as being statistically significant when they should have been published as non-significant. We estimate the extent of selectively reported <i>p</i>-values to range between 57.7% and 71.9% of the significant <i>p</i>-values. The counterfactual <i>p</i>-value distribution also allows us to assess shifts of <i>p</i>-values along the entire distribution of published <i>p</i>-values, revealing that particularly very small <i>p</i>-values (<i>p</i> < 0.001) are unexpectedly abundant in the published literature. Subsample analysis suggests that the extent of selective reporting is reduced in research fields that use experimental designs, analyze microeconomics research questions, and have at least some adequately powered studies.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 4","pages":"590-602"},"PeriodicalIF":5.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1711","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139911665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0