Caspar J. Van Lissa, Eli-Boaz Clapper, Rebecca Kuiper
The product Bayes factor (PBF) synthesizes evidence for an informative hypothesis across heterogeneous replication studies. It can be used when fixed- or random effects meta-analysis fall short. For example, when effect sizes are incomparable and cannot be pooled, or when studies diverge significantly in the populations, study designs, and measures used. PBF shines as a solution for small sample meta-analyses, where the number of between-study differences is often large relative to the number of studies, precluding the use of meta-regression to account for these differences. Users should be mindful of the fact that the PBF answers a qualitatively different research question than other evidence synthesis methods. For example, whereas fixed-effect meta-analysis estimates the size of a population effect, the PBF quantifies to what extent an informative hypothesis is supported in all included studies. This tutorial paper showcases the user-friendly PBF functionality within the bain R-package. This new implementation of an existing method was validated using a simulation study, available in an Online Supplement. Results showed that PBF had a high overall accuracy, due to greater sensitivity and lower specificity, compared to random-effects meta-analysis, individual participant data meta-analysis, and vote counting. Tutorials demonstrate applications of the method on meta-analytic and individual participant data. The example datasets, based on published research, are included in bain so readers can reproduce the examples and apply the code to their own data. The PBF is a promising method for synthesizing evidence for informative hypotheses across conceptual replications that are not suitable for conventional meta-analysis.
{"title":"A tutorial on aggregating evidence from conceptual replication studies using the product Bayes factor","authors":"Caspar J. Van Lissa, Eli-Boaz Clapper, Rebecca Kuiper","doi":"10.1002/jrsm.1765","DOIUrl":"10.1002/jrsm.1765","url":null,"abstract":"<p>The product Bayes factor (PBF) synthesizes evidence for an informative hypothesis across heterogeneous replication studies. It can be used when fixed- or random effects meta-analysis fall short. For example, when effect sizes are incomparable and cannot be pooled, or when studies diverge significantly in the populations, study designs, and measures used. PBF shines as a solution for small sample meta-analyses, where the number of between-study differences is often large relative to the number of studies, precluding the use of meta-regression to account for these differences. Users should be mindful of the fact that the PBF answers a qualitatively different research question than other evidence synthesis methods. For example, whereas fixed-effect meta-analysis estimates the size of a population effect, the PBF quantifies to what extent an informative hypothesis is supported in all included studies. This tutorial paper showcases the user-friendly PBF functionality within the bain R-package. This new implementation of an existing method was validated using a simulation study, available in an Online Supplement. Results showed that PBF had a high overall accuracy, due to greater sensitivity and lower specificity, compared to random-effects meta-analysis, individual participant data meta-analysis, and vote counting. Tutorials demonstrate applications of the method on meta-analytic and individual participant data. The example datasets, based on published research, are included in bain so readers can reproduce the examples and apply the code to their own data. The PBF is a promising method for synthesizing evidence for informative hypotheses across conceptual replications that are not suitable for conventional meta-analysis.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1231-1243"},"PeriodicalIF":5.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1765","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142491729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Livia Puljak, Andrija Babić, Ognjen Barčot, Tina Poklepović Peričić
{"title":"Evolving use of the Cochrane Risk of Bias 2 tool in biomedical systematic reviews","authors":"Livia Puljak, Andrija Babić, Ognjen Barčot, Tina Poklepović Peričić","doi":"10.1002/jrsm.1756","DOIUrl":"10.1002/jrsm.1756","url":null,"abstract":"","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1246-1247"},"PeriodicalIF":5.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142491730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kansak Boonpattharatthiti, Garin Ruenin, Pun Kulwong, Jitsupa Lueawattanasakul, Chintra Saechao, Panitan Pitak, Deborah M. Caldwell, Nathorn Chaiyakunapruk, Teerapon Dhippayom
Psychological interventions are complex in nature and have been shown to benefit various clinical outcomes. Gaining insight into current practices would help identify specific aspects that need improvement to enhance the quality of network meta-analysis (NMA) in this field. This scoping review aimed to explore methodological approaches in the NMA of psychological interventions. We searched PubMed, EMBASE, and Cochrane CENTRAL in September 2023. We included NMAs of psychological interventions of randomized controlled trials that reported clinical outcomes. Three independent researchers assessed the eligibility and extracted relevant data. The findings were presented using descriptive statistics. Of the 1827 articles identified, 187 studies were included. Prior protocol registration was reported in 130 studies (69.5%). Forty-six studies (24.6%) attempted to search for gray literature. Ninety-four studies (50.3%) explicitly assessed transitivity. Nearly three-quarters (143 studies, 76.5%) classified treatment nodes by the type of psychological intervention, while 13 studies (7.0%) did so by lumping different intervention types into more broader intervention classes. Seven studies (3.7%) examined active components of the intervention using component NMA. Only three studies (1.6%) classified interventions based on factors affecting intervention practices, specifically intensity, provider, and delivery platform. Meanwhile, 29 studies (15.5%) explored the influential effects of these factors using meta-regression, subgroup analysis, or sensitivity analysis. The certainty of evidence was assessed in 80 studies (42.8%). The methodological approach in NMAs of psychological interventions should be improved, specifically in classifying psychological interventions into treatment nodes, exploring the effects of intervention-related factors, and assessing the certainty of evidence.
{"title":"Exploring methodological approaches used in network meta-analysis of psychological interventions: A scoping review","authors":"Kansak Boonpattharatthiti, Garin Ruenin, Pun Kulwong, Jitsupa Lueawattanasakul, Chintra Saechao, Panitan Pitak, Deborah M. Caldwell, Nathorn Chaiyakunapruk, Teerapon Dhippayom","doi":"10.1002/jrsm.1764","DOIUrl":"10.1002/jrsm.1764","url":null,"abstract":"<p>Psychological interventions are complex in nature and have been shown to benefit various clinical outcomes. Gaining insight into current practices would help identify specific aspects that need improvement to enhance the quality of network meta-analysis (NMA) in this field. This scoping review aimed to explore methodological approaches in the NMA of psychological interventions. We searched PubMed, EMBASE, and Cochrane CENTRAL in September 2023. We included NMAs of psychological interventions of randomized controlled trials that reported clinical outcomes. Three independent researchers assessed the eligibility and extracted relevant data. The findings were presented using descriptive statistics. Of the 1827 articles identified, 187 studies were included. Prior protocol registration was reported in 130 studies (69.5%). Forty-six studies (24.6%) attempted to search for gray literature. Ninety-four studies (50.3%) explicitly assessed transitivity. Nearly three-quarters (143 studies, 76.5%) classified treatment nodes by the type of psychological intervention, while 13 studies (7.0%) did so by lumping different intervention types into more broader intervention classes. Seven studies (3.7%) examined active components of the intervention using component NMA. Only three studies (1.6%) classified interventions based on factors affecting intervention practices, specifically intensity, provider, and delivery platform. Meanwhile, 29 studies (15.5%) explored the influential effects of these factors using meta-regression, subgroup analysis, or sensitivity analysis. The certainty of evidence was assessed in 80 studies (42.8%). The methodological approach in NMAs of psychological interventions should be improved, specifically in classifying psychological interventions into treatment nodes, exploring the effects of intervention-related factors, and assessing the certainty of evidence.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1161-1174"},"PeriodicalIF":5.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1764","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142491731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars König, Steffen Zitzmann, Tim Fütterer, Diego G. Campos, Ronny Scherer, Martin Hecht
Several AI-aided screening tools have emerged to tackle the ever-expanding body of literature. These tools employ active learning, where algorithms sort abstracts based on human feedback. However, researchers using these tools face a crucial dilemma: When should they stop screening without knowing the proportion of relevant studies? Although numerous stopping rules have been proposed to guide users in this decision, they have yet to undergo comprehensive evaluation. In this study, we evaluated the performance of three stopping rules: the knee method, a data-driven heuristic, and a prevalence estimation technique. We measured performance via sensitivity, specificity, and screening cost and explored the influence of the prevalence of relevant studies and the choice of the learning algorithm. We curated a dataset of abstract collections from meta-analyses across five psychological research domains. Our findings revealed performance differences between stopping rules regarding all performance measures and variations in the performance of stopping rules across different prevalence ratios. Moreover, despite the relatively minor impact of the learning algorithm, we found that specific combinations of stopping rules and learning algorithms were most effective for certain prevalence ratios of relevant abstracts. Based on these results, we derived practical recommendations for users of AI-aided screening tools. Furthermore, we discuss possible implications and offer suggestions for future research.
{"title":"An evaluation of the performance of stopping rules in AI-aided screening for psychological meta-analytical research","authors":"Lars König, Steffen Zitzmann, Tim Fütterer, Diego G. Campos, Ronny Scherer, Martin Hecht","doi":"10.1002/jrsm.1762","DOIUrl":"10.1002/jrsm.1762","url":null,"abstract":"<p>Several AI-aided screening tools have emerged to tackle the ever-expanding body of literature. These tools employ active learning, where algorithms sort abstracts based on human feedback. However, researchers using these tools face a crucial dilemma: When should they stop screening without knowing the proportion of relevant studies? Although numerous stopping rules have been proposed to guide users in this decision, they have yet to undergo comprehensive evaluation. In this study, we evaluated the performance of three stopping rules: the knee method, a data-driven heuristic, and a prevalence estimation technique. We measured performance via sensitivity, specificity, and screening cost and explored the influence of the prevalence of relevant studies and the choice of the learning algorithm. We curated a dataset of abstract collections from meta-analyses across five psychological research domains. Our findings revealed performance differences between stopping rules regarding all performance measures and variations in the performance of stopping rules across different prevalence ratios. Moreover, despite the relatively minor impact of the learning algorithm, we found that specific combinations of stopping rules and learning algorithms were most effective for certain prevalence ratios of relevant abstracts. Based on these results, we derived practical recommendations for users of AI-aided screening tools. Furthermore, we discuss possible implications and offer suggestions for future research.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1120-1146"},"PeriodicalIF":5.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1762","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142454357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Pachanov, Catharina Münte, Julian Hirt, Dawid Pieper
While geographic search filters exist, few of them are validated and there are currently none that focus on Germany. We aimed to develop and validate a highly sensitive geographic search filter for MEDLINE (PubMed) that identifies studies about Germany. First, using the relative recall method, we created a gold standard set of studies about Germany, dividing it into ‘development’ and ‘testing’ sets. Next, candidate search terms were identified using (i) term frequency analyses in the ‘development set’ and a random set of MEDLINE records; and (ii) a list of German geographic locations, compiled by our team. Then, we iteratively created the filter, evaluating it against the ‘development’ and ‘testing’ sets. To validate the filter, we conducted a number of case studies (CSs) and a simulation study. For this validation we used systematic reviews (SRs) that had included studies about Germany but did not restrict their search strategy geographically. When applying the filter to the original search strategies of the 17 SRs eligible for CSs, the median precision was 2.64% (interquartile range [IQR]: 1.34%–6.88%) versus 0.16% (IQR: 0.10%–0.49%) without the filter. The median number-needed-to-read (NNR) decreased from 625 (IQR: 211–1042) to 38 (IQR: 15–76). The filter achieved 100% sensitivity in 13 CSs, 85.71% in 2 CSs and 87.50% and 80% in the remaining 2 CSs. In a simulation study, the filter demonstrated an overall sensitivity of 97.19% and NNR of 42. The filter reliably identifies studies about Germany, enhancing screening efficiency and can be applied in evidence syntheses focusing on Germany.
{"title":"Development and validation of a geographic search filter for MEDLINE (PubMed) to identify studies about Germany","authors":"Alexander Pachanov, Catharina Münte, Julian Hirt, Dawid Pieper","doi":"10.1002/jrsm.1763","DOIUrl":"10.1002/jrsm.1763","url":null,"abstract":"<p>While geographic search filters exist, few of them are validated and there are currently none that focus on Germany. We aimed to develop and validate a highly sensitive geographic search filter for MEDLINE (PubMed) that identifies studies about Germany. First, using the relative recall method, we created a gold standard set of studies about Germany, dividing it into ‘development’ and ‘testing’ sets. Next, candidate search terms were identified using (i) term frequency analyses in the ‘development set’ and a random set of MEDLINE records; and (ii) a list of German geographic locations, compiled by our team. Then, we iteratively created the filter, evaluating it against the ‘development’ and ‘testing’ sets. To validate the filter, we conducted a number of case studies (CSs) and a simulation study. For this validation we used systematic reviews (SRs) that had included studies about Germany but did not restrict their search strategy geographically. When applying the filter to the original search strategies of the 17 SRs eligible for CSs, the median precision was 2.64% (interquartile range [IQR]: 1.34%–6.88%) versus 0.16% (IQR: 0.10%–0.49%) without the filter. The median number-needed-to-read (NNR) decreased from 625 (IQR: 211–1042) to 38 (IQR: 15–76). The filter achieved 100% sensitivity in 13 CSs, 85.71% in 2 CSs and 87.50% and 80% in the remaining 2 CSs. In a simulation study, the filter demonstrated an overall sensitivity of 97.19% and NNR of 42. The filter reliably identifies studies about Germany, enhancing screening efficiency and can be applied in evidence syntheses focusing on Germany.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1147-1160"},"PeriodicalIF":5.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1763","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142454358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annabel L. Davies, A. E. Ades, Julian P. T. Higgins
Quantitative evidence synthesis methods aim to combine data from multiple medical trials to infer relative effects of different interventions. A challenge arises when trials report continuous outcomes on different measurement scales. To include all evidence in one coherent analysis, we require methods to “map” the outcomes onto a single scale. This is particularly challenging when trials report aggregate rather than individual data. We are motivated by a meta-analysis of interventions to prevent obesity in children. Trials report aggregate measurements of body mass index (BMI) either expressed as raw values or standardized for age and sex. We develop three methods for mapping between aggregate BMI data using known or estimated relationships between measurements on different scales at the individual level. The first is an analytical method based on the mathematical definitions of z-scores and percentiles. The other two approaches involve sampling individual participant data on which to perform the conversions. One method is a straightforward sampling routine, while the other involves optimization with respect to the reported outcomes. In contrast to the analytical approach, these methods also have wider applicability for mapping between any pair of measurement scales with known or estimable individual-level relationships. We verify and contrast our methods using simulation studies and trials from our data set which report outcomes on multiple scales. We find that all methods recreate mean values with reasonable accuracy, but for standard deviations, optimization outperforms the other methods. However, the optimization method is more likely to underestimate standard deviations and is vulnerable to non-convergence.
定量证据综合方法旨在将多项医学试验的数据结合起来,以推断不同干预措施的相对效果。当试验以不同的测量尺度报告连续性结果时,就会出现挑战。为了将所有证据纳入一个连贯的分析中,我们需要将结果 "映射 "到单一量表上的方法。当试验报告的是总体数据而非个体数据时,这一点尤其具有挑战性。我们对预防儿童肥胖的干预措施进行了荟萃分析。试验报告了身体质量指数(BMI)的总体测量结果,这些结果可以是原始值,也可以是年龄和性别标准化值。我们开发了三种方法,利用已知或估计的个体水平上不同尺度测量值之间的关系,在总体 BMI 数据之间进行映射。第一种是基于 z 值和百分位数数学定义的分析方法。另外两种方法涉及对个人参与者数据进行抽样,并在此基础上进行转换。其中一种方法是直接抽样,而另一种方法则涉及对报告结果的优化。与分析方法相比,这些方法还具有更广泛的适用性,可用于绘制任何一对具有已知或可估算个体水平关系的测量量表之间的关系图。我们使用模拟研究和数据集中报告多个量表结果的试验来验证和对比我们的方法。我们发现,所有方法都能以合理的准确度再现平均值,但在标准偏差方面,优化方法优于其他方法。不过,优化方法更容易低估标准偏差,而且容易出现不收敛现象。
{"title":"Mapping between measurement scales in meta-analysis, with application to measures of body mass index in children","authors":"Annabel L. Davies, A. E. Ades, Julian P. T. Higgins","doi":"10.1002/jrsm.1758","DOIUrl":"10.1002/jrsm.1758","url":null,"abstract":"<p>Quantitative evidence synthesis methods aim to combine data from multiple medical trials to infer relative effects of different interventions. A challenge arises when trials report continuous outcomes on different measurement scales. To include all evidence in one coherent analysis, we require methods to “map” the outcomes onto a single scale. This is particularly challenging when trials report aggregate rather than individual data. We are motivated by a meta-analysis of interventions to prevent obesity in children. Trials report aggregate measurements of body mass index (BMI) either expressed as raw values or standardized for age and sex. We develop three methods for mapping between aggregate BMI data using known or estimated relationships between measurements on different scales at the individual level. The first is an analytical method based on the mathematical definitions of z-scores and percentiles. The other two approaches involve sampling individual participant data on which to perform the conversions. One method is a straightforward sampling routine, while the other involves optimization with respect to the reported outcomes. In contrast to the analytical approach, these methods also have wider applicability for mapping between any pair of measurement scales with known or estimable individual-level relationships. We verify and contrast our methods using simulation studies and trials from our data set which report outcomes on multiple scales. We find that all methods recreate mean values with reasonable accuracy, but for standard deviations, optimization outperforms the other methods. However, the optimization method is more likely to underestimate standard deviations and is vulnerable to non-convergence.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1072-1093"},"PeriodicalIF":5.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1758","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Tian, Xi Yang, Suhail A. Doi, Luis Furuya-Kanamori, Lifeng Lin, Joey S. W. Kwong, Chang Xu
RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two different approaches: (1) manually by human reviewers, and (2) automatically by the RobotReviewer. The manual assessment was based on two groups independently, with two additional rounds of verification. The agreement between RobotReviewer and humans was measured via the concordance rate and Cohen's kappa statistics, based on the comparison of binary classification of the risk of bias (low vs. high/unclear) as restricted by RobotReviewer. The concordance rates varied by domain, ranging from 63.07% to 83.32%. Cohen's kappa statistics showed a poor agreement between humans and RobotReviewer for allocation concealment (κ = 0.25, 95% CI: 0.21–0.30), blinding of outcome assessors (κ = 0.27, 95% CI: 0.23–0.31); While moderate for random sequence generation (κ = 0.46, 95% CI: 0.41–0.50) and blinding of participants and personnel (κ = 0.59, 95% CI: 0.55–0.64). The findings demonstrate that there were domain-specific differences in the level of agreement between RobotReviewer and humans. We suggest that it might be a useful auxiliary tool, but the specific manner of its integration as a complementary tool requires further discussion.
{"title":"Towards the automatic risk of bias assessment on randomized controlled trials: A comparison of RobotReviewer and humans","authors":"Yuan Tian, Xi Yang, Suhail A. Doi, Luis Furuya-Kanamori, Lifeng Lin, Joey S. W. Kwong, Chang Xu","doi":"10.1002/jrsm.1761","DOIUrl":"10.1002/jrsm.1761","url":null,"abstract":"<p>RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two different approaches: (1) manually by human reviewers, and (2) automatically by the RobotReviewer. The manual assessment was based on two groups independently, with two additional rounds of verification. The agreement between RobotReviewer and humans was measured via the concordance rate and Cohen's kappa statistics, based on the comparison of binary classification of the risk of bias (low vs. high/unclear) as restricted by RobotReviewer. The concordance rates varied by domain, ranging from 63.07% to 83.32%. Cohen's kappa statistics showed a poor agreement between humans and RobotReviewer for allocation concealment (<i>κ</i> = 0.25, 95% CI: 0.21–0.30), blinding of outcome assessors (<i>κ</i> = 0.27, 95% CI: 0.23–0.31); While moderate for random sequence generation (<i>κ</i> = 0.46, 95% CI: 0.41–0.50) and blinding of participants and personnel (<i>κ</i> = 0.59, 95% CI: 0.55–0.64). The findings demonstrate that there were domain-specific differences in the level of agreement between RobotReviewer and humans. We suggest that it might be a useful auxiliary tool, but the specific manner of its integration as a complementary tool requires further discussion.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1111-1119"},"PeriodicalIF":5.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142338037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In health technology assessment, matching-adjusted indirect comparison (MAIC) is the most common method for pairwise comparisons that control for imbalances in baseline characteristics across trials. One of the primary challenges in MAIC is the need to properly account for the additional uncertainty introduced by the matching process. Limited evidence and guidance are available on variance estimation in MAICs. Therefore, we conducted a comprehensive Monte Carlo simulation study to evaluate the performance of different statistical methods across 108 scenarios. Four general approaches for variance estimation were compared in both anchored and unanchored MAICs of binary and time-to-event outcomes: (1) conventional estimators (CE) using raw weights; (2) CE using weights rescaled to the effective sample size (ESS); (3) robust sandwich estimators; and (4) bootstrapping. Several variants of sandwich estimators and bootstrap methods were tested. Performance was quantified on the basis of empirical coverage probabilities for 95% confidence intervals and variability ratios. Variability was underestimated by CE + raw weights when population overlap was poor or moderate. Despite several theoretical limitations, CE + ESS weights accurately estimated uncertainty across most scenarios. Original implementations of sandwich estimators had a downward bias in MAICs with a small ESS, and finite sample adjustments led to marked improvements. Bootstrapping was unstable if population overlap was poor and the sample size was limited. All methods produced valid coverage probabilities and standard errors in cases of strong population overlap. Our findings indicate that the sample size, population overlap, and outcome type are important considerations for variance estimation in MAICs.
{"title":"Uncertain about uncertainty in matching-adjusted indirect comparisons? A simulation study to compare methods for variance estimation","authors":"Conor O. Chandler, Irina Proskorovsky","doi":"10.1002/jrsm.1759","DOIUrl":"10.1002/jrsm.1759","url":null,"abstract":"<p>In health technology assessment, matching-adjusted indirect comparison (MAIC) is the most common method for pairwise comparisons that control for imbalances in baseline characteristics across trials. One of the primary challenges in MAIC is the need to properly account for the additional uncertainty introduced by the matching process. Limited evidence and guidance are available on variance estimation in MAICs. Therefore, we conducted a comprehensive Monte Carlo simulation study to evaluate the performance of different statistical methods across 108 scenarios. Four general approaches for variance estimation were compared in both anchored and unanchored MAICs of binary and time-to-event outcomes: (1) conventional estimators (CE) using raw weights; (2) CE using weights rescaled to the effective sample size (ESS); (3) robust sandwich estimators; and (4) bootstrapping. Several variants of sandwich estimators and bootstrap methods were tested. Performance was quantified on the basis of empirical coverage probabilities for 95% confidence intervals and variability ratios. Variability was underestimated by CE + raw weights when population overlap was poor or moderate. Despite several theoretical limitations, CE + ESS weights accurately estimated uncertainty across most scenarios. Original implementations of sandwich estimators had a downward bias in MAICs with a small ESS, and finite sample adjustments led to marked improvements. Bootstrapping was unstable if population overlap was poor and the sample size was limited. All methods produced valid coverage probabilities and standard errors in cases of strong population overlap. Our findings indicate that the sample size, population overlap, and outcome type are important considerations for variance estimation in MAICs.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1094-1110"},"PeriodicalIF":5.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1759","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142338038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Kang Tu, Pei-Chun Lai, Yen-Ta Huang, James Hodges
Network meta-analysis (NMA) incorporates all available evidence into a general statistical framework for comparing multiple treatments. Standard NMAs make three major assumptions, namely homogeneity, similarity, and consistency, and violating these assumptions threatens an NMA's validity. In this article, we suggest a graphical approach to assessing these assumptions and distinguishing between qualitative and quantitative versions of these assumptions. In our plot, the absolute effect of each treatment arm is plotted against the level of effect modifiers, and the three assumptions of NMA can then be visually evaluated. We use four hypothetical scenarios to show how violating these assumptions can lead to different consequences and difficulties in interpreting an NMA. We present an example of an NMA evaluating steroid use to treat septic shock patients to demonstrate how to use our graphical approach to assess an NMA's assumptions and how this approach can help with interpreting the results. We also show that all three assumptions of NMA can be summarized as an exchangeability assumption. Finally, we discuss how reporting of NMAs can be improved to increase transparency of the analysis and interpretability of the results.
{"title":"Visualizing the assumptions of network meta-analysis","authors":"Yu-Kang Tu, Pei-Chun Lai, Yen-Ta Huang, James Hodges","doi":"10.1002/jrsm.1760","DOIUrl":"10.1002/jrsm.1760","url":null,"abstract":"<p>Network meta-analysis (NMA) incorporates all available evidence into a general statistical framework for comparing multiple treatments. Standard NMAs make three major assumptions, namely homogeneity, similarity, and consistency, and violating these assumptions threatens an NMA's validity. In this article, we suggest a graphical approach to assessing these assumptions and distinguishing between qualitative and quantitative versions of these assumptions. In our plot, the absolute effect of each treatment arm is plotted against the level of effect modifiers, and the three assumptions of NMA can then be visually evaluated. We use four hypothetical scenarios to show how violating these assumptions can lead to different consequences and difficulties in interpreting an NMA. We present an example of an NMA evaluating steroid use to treat septic shock patients to demonstrate how to use our graphical approach to assess an NMA's assumptions and how this approach can help with interpreting the results. We also show that all three assumptions of NMA can be summarized as an exchangeability assumption. Finally, we discuss how reporting of NMAs can be improved to increase transparency of the analysis and interpretability of the results.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1175-1182"},"PeriodicalIF":5.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1760","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142306814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikkel Helding Vembye, James Eric Pustejovsky, Therese Deocampo Pigott
Sample size and statistical power are important factors to consider when planning a research synthesis. Power analysis methods have been developed for fixed effect or random effects models, but until recently these methods were limited to simple data structures with a single, independent effect per study. Recent work has provided power approximation formulas for meta-analyses involving studies with multiple, dependent effect size estimates, which are common in syntheses of social science research. Prior work focused on developing and validating the approximations but did not address the practice challenges encountered in applying them for purposes of planning a synthesis involving dependent effect sizes. We aim to facilitate the application of these recent developments by providing practical guidance on how to conduct power analysis for planning a meta-analysis of dependent effect sizes and by introducing a new R package, POMADE, designed for this purpose. We present a comprehensive overview of resources for finding information about the study design features and model parameters needed to conduct power analysis, along with detailed worked examples using the POMADE package. For presenting power analysis findings, we emphasize graphical tools that can depict power under a range of plausible assumptions and introduce a novel plot, the traffic light power plot, for conveying the degree of certainty in one's assumptions.
在规划研究综述时,样本量和统计功率是需要考虑的重要因素。已有针对固定效应或随机效应模型的幂分析方法,但直到最近,这些方法仍局限于每项研究只有一个独立效应的简单数据结构。最近的工作为涉及具有多个依赖效应大小估计值的研究的荟萃分析提供了功率近似公式,这在社会科学研究的综合分析中很常见。之前的工作侧重于近似值的开发和验证,但并没有解决在应用这些近似值规划涉及依存效应大小的综述时遇到的实践挑战。我们的目的是通过提供实用指南,指导如何在规划依存效应大小的荟萃分析时进行功率分析,并介绍为此目的设计的新 R 软件包 POMADE,从而促进这些最新进展的应用。我们全面概述了进行功率分析所需的研究设计特征和模型参数的相关资源信息,以及使用 POMADE 软件包的详细工作示例。在展示功率分析结果时,我们强调图形工具可以描述一系列可信假设下的功率,并介绍了一种新颖的图谱--交通灯功率图,用于表达假设的确定程度。
{"title":"Conducting power analysis for meta-analysis with dependent effect sizes: Common guidelines and an introduction to the POMADE R package","authors":"Mikkel Helding Vembye, James Eric Pustejovsky, Therese Deocampo Pigott","doi":"10.1002/jrsm.1752","DOIUrl":"10.1002/jrsm.1752","url":null,"abstract":"<p>Sample size and statistical power are important factors to consider when planning a research synthesis. Power analysis methods have been developed for fixed effect or random effects models, but until recently these methods were limited to simple data structures with a single, independent effect per study. Recent work has provided power approximation formulas for meta-analyses involving studies with multiple, dependent effect size estimates, which are common in syntheses of social science research. Prior work focused on developing and validating the approximations but did not address the practice challenges encountered in applying them for purposes of planning a synthesis involving dependent effect sizes. We aim to facilitate the application of these recent developments by providing practical guidance on how to conduct power analysis for planning a meta-analysis of dependent effect sizes and by introducing a new R package, <i>POMADE</i>, designed for this purpose. We present a comprehensive overview of resources for finding information about the study design features and model parameters needed to conduct power analysis, along with detailed worked examples using the POMADE package. For presenting power analysis findings, we emphasize graphical tools that can depict power under a range of plausible assumptions and introduce a novel plot, the traffic light power plot, for conveying the degree of certainty in one's assumptions.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1214-1230"},"PeriodicalIF":5.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142249349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}