Pub Date : 2025-03-01Epub Date: 2025-03-10DOI: 10.1017/rsm.2024.17
Georgios F Nikolaidis, Beth Woods, Stephen Palmer, Sylwia Bujkiewicz, Marta O Soares
Limited evidence on relative effectiveness is common in Health Technology Assessment (HTA), often due to sparse evidence on the population of interest or study-design constraints. When evidence directly relating to the policy decision is limited, the evidence base could be extended to incorporate indirectly related evidence. For instance, a sparse evidence base in children could borrow strength from evidence in adults to improve estimation and reduce uncertainty. In HTA, indirect evidence has typically been either disregarded ('splitting'; no information-sharing) or included without considering any differences ('lumping'; full information-sharing). However, sophisticated methods that impose moderate degrees of information-sharing have been proposed. We describe and implement multiple information-sharing methods in a case-study evaluating the effectiveness, cost-effectiveness and value of further research of intravenous immunoglobulin for severe sepsis and septic shock. We also provide metrics to determine the degree of information-sharing. Results indicate that method choice can have significant impact. Across information-sharing models, odds ratio estimates ranged between 0.55 and 0.90 and incremental cost-effectiveness ratios between £16,000-52,000 per quality-adjusted life year gained. The need for a future trial also differed by information-sharing model. Heterogeneity in the indirect evidence should also be carefully considered, as it may significantly impact estimates. We conclude that when indirect evidence is relevant to an assessment of effectiveness, the full range of information-sharing methods should be considered. The final selection should be based on a deliberative process that considers not only the plausibility of the methods' assumptions but also the imposed degree of information-sharing.
{"title":"Methods for information-sharing in network meta-analysis: Implications for inference and policy.","authors":"Georgios F Nikolaidis, Beth Woods, Stephen Palmer, Sylwia Bujkiewicz, Marta O Soares","doi":"10.1017/rsm.2024.17","DOIUrl":"10.1017/rsm.2024.17","url":null,"abstract":"<p><p>Limited evidence on relative effectiveness is common in Health Technology Assessment (HTA), often due to sparse evidence on the population of interest or study-design constraints. When evidence directly relating to the policy decision is limited, the evidence base could be extended to incorporate indirectly related evidence. For instance, a sparse evidence base in children could borrow strength from evidence in adults to improve estimation and reduce uncertainty. In HTA, indirect evidence has typically been either disregarded ('splitting'; no information-sharing) or included without considering any differences ('lumping'; full information-sharing). However, sophisticated methods that impose moderate degrees of information-sharing have been proposed. We describe and implement multiple information-sharing methods in a case-study evaluating the effectiveness, cost-effectiveness and value of further research of intravenous immunoglobulin for severe sepsis and septic shock. We also provide metrics to determine the degree of information-sharing. Results indicate that method choice can have significant impact. Across information-sharing models, odds ratio estimates ranged between 0.55 and 0.90 and incremental cost-effectiveness ratios between £16,000-52,000 per quality-adjusted life year gained. The need for a future trial also differed by information-sharing model. Heterogeneity in the indirect evidence should also be carefully considered, as it may significantly impact estimates. We conclude that when indirect evidence is relevant to an assessment of effectiveness, the full range of information-sharing methods should be considered. The final selection should be based on a deliberative process that considers not only the plausibility of the methods' assumptions but also the imposed degree of information-sharing.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 2","pages":"291-307"},"PeriodicalIF":6.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-04-01DOI: 10.1017/rsm.2024.9
Yanfei Li, Elizabeth Ghogomu, Xu Hui, E Fenfen, Fiona Campbell, Hanan Khalil, Xiuxia Li, Marie Gaarder, Promise M Nduku, Howard White, Liangying Hou, Nan Chen, Shenggang Xu, Ning Ma, Xiaoye Hu, Xian Liu, Vivian Welch, Kehu Yang
Mapping reviews (MRs) are crucial for identifying research gaps and enhancing evidence utilization. Despite their increasing use in health and social sciences, inconsistencies persist in both their conceptualization and reporting. This study aims to clarify the conceptual framework and gather reporting items from existing guidance and methodological studies. A comprehensive search was conducted across nine databases and 11 institutional websites, including documents up to January 2024. A total of 68 documents were included, addressing 24 MR terms and 55 definitions, with 39 documents discussing distinctions and overlaps among these terms. From the documents included, 28 reporting items were identified, covering all the steps of the process. Seven documents mentioned reporting on the title, four on the abstract, and 14 on the background. Ten methods-related items appeared in 56 documents, with the median number of documents supporting each item being 34 (interquartile range [IQR]: 27, 39). Four results-related items were mentioned in 18 documents (median: 14.5, IQR: 11.5, 16), and four discussion-related items appeared in 25 documents (median: 5.5, IQR: 3, 13). There was very little guidance about reporting conclusions, acknowledgments, author contributions, declarations of interest, and funding sources. This study proposes a draft 28-item reporting checklist for MRs and has identified terminologies and concepts used to describe MRs. These findings will first be used to inform a Delphi consensus process to develop reporting guidelines for MRs. Additionally, the checklist and definitions could be used to guide researchers in reporting high-quality MRs.
{"title":"Key concepts and reporting recommendations for mapping reviews: A scoping review of 68 guidance and methodological studies.","authors":"Yanfei Li, Elizabeth Ghogomu, Xu Hui, E Fenfen, Fiona Campbell, Hanan Khalil, Xiuxia Li, Marie Gaarder, Promise M Nduku, Howard White, Liangying Hou, Nan Chen, Shenggang Xu, Ning Ma, Xiaoye Hu, Xian Liu, Vivian Welch, Kehu Yang","doi":"10.1017/rsm.2024.9","DOIUrl":"10.1017/rsm.2024.9","url":null,"abstract":"<p><p>Mapping reviews (MRs) are crucial for identifying research gaps and enhancing evidence utilization. Despite their increasing use in health and social sciences, inconsistencies persist in both their conceptualization and reporting. This study aims to clarify the conceptual framework and gather reporting items from existing guidance and methodological studies. A comprehensive search was conducted across nine databases and 11 institutional websites, including documents up to January 2024. A total of 68 documents were included, addressing 24 MR terms and 55 definitions, with 39 documents discussing distinctions and overlaps among these terms. From the documents included, 28 reporting items were identified, covering all the steps of the process. Seven documents mentioned reporting on the title, four on the abstract, and 14 on the background. Ten methods-related items appeared in 56 documents, with the median number of documents supporting each item being 34 (interquartile range [IQR]: 27, 39). Four results-related items were mentioned in 18 documents (median: 14.5, IQR: 11.5, 16), and four discussion-related items appeared in 25 documents (median: 5.5, IQR: 3, 13). There was very little guidance about reporting conclusions, acknowledgments, author contributions, declarations of interest, and funding sources. This study proposes a draft 28-item reporting checklist for MRs and has identified terminologies and concepts used to describe MRs. These findings will first be used to inform a Delphi consensus process to develop reporting guidelines for MRs. Additionally, the checklist and definitions could be used to guide researchers in reporting high-quality MRs.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"157-174"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12631146/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-12DOI: 10.1017/rsm.2024.2
Maya B Mathur
In small meta-analyses (e.g., up to 20 studies), the best-performing frequentist methods can yield very wide confidence intervals for the meta-analytic mean, as well as biased and imprecise estimates of the heterogeneity. We investigate the frequentist performance of alternative Bayesian methods that use the invariant Jeffreys prior. This prior has the usual Bayesian motivation, but also has a purely frequentist motivation: the resulting posterior modes correspond to the established Firth bias correction of the maximum likelihood estimator. We consider two forms of the Jeffreys prior for random-effects meta-analysis: the previously established "Jeffreys1" prior treats the heterogeneity as a nuisance parameter, whereas the "Jeffreys2" prior treats both the mean and the heterogeneity as estimands of interest. In a large simulation study, we assess the performance of both Jeffreys priors, considering different types of Bayesian estimates and intervals. We assess point and interval estimation for both the mean and the heterogeneity parameters, comparing to the best-performing frequentist methods. For small meta-analyses of binary outcomes, the Jeffreys2 prior may offer advantages over standard frequentist methods for point and interval estimation of the mean parameter. In these cases, Jeffreys2 can substantially improve efficiency while more often showing nominal frequentist coverage. However, for small meta-analyses of continuous outcomes, standard frequentist methods seem to remain the best choices. The best-performing method for estimating the heterogeneity varied according to the heterogeneity itself. Röver & Friede's R package bayesmeta implements both Jeffreys priors. We also generalize the Jeffreys2 prior to the case of meta-regression.
{"title":"Meta-analysis with Jeffreys priors: Empirical frequentist properties.","authors":"Maya B Mathur","doi":"10.1017/rsm.2024.2","DOIUrl":"10.1017/rsm.2024.2","url":null,"abstract":"<p><p>In small meta-analyses (e.g., up to 20 studies), the best-performing frequentist methods can yield very wide confidence intervals for the meta-analytic mean, as well as biased and imprecise estimates of the heterogeneity. We investigate the frequentist performance of alternative Bayesian methods that use the invariant Jeffreys prior. This prior has the usual Bayesian motivation, but also has a purely frequentist motivation: the resulting posterior modes correspond to the established Firth bias correction of the maximum likelihood estimator. We consider two forms of the Jeffreys prior for random-effects meta-analysis: the previously established \"Jeffreys1\" prior treats the heterogeneity as a nuisance parameter, whereas the \"Jeffreys2\" prior treats both the mean and the heterogeneity as estimands of interest. In a large simulation study, we assess the performance of both Jeffreys priors, considering different types of Bayesian estimates and intervals. We assess point and interval estimation for both the mean and the heterogeneity parameters, comparing to the best-performing frequentist methods. For small meta-analyses of binary outcomes, the Jeffreys2 prior may offer advantages over standard frequentist methods for point and interval estimation of the mean parameter. In these cases, Jeffreys2 can substantially improve efficiency while more often showing nominal frequentist coverage. However, for small meta-analyses of continuous outcomes, standard frequentist methods seem to remain the best choices. The best-performing method for estimating the heterogeneity varied according to the heterogeneity itself. Röver & Friede's R package bayesmeta implements both Jeffreys priors. We also generalize the Jeffreys2 prior to the case of meta-regression.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"87-122"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12621536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-10DOI: 10.1017/rsm.2024.16
Farhan Ali, Amanda Swee-Ching Tan, Serena Jun-Wei Wang
Systematic reviews play important roles but manual efforts can be time-consuming given a growing literature. There is a need to use and evaluate automated strategies to accelerate systematic reviews. Here, we comprehensively tested machine learning (ML) models from classical and deep learning model families. We also assessed the performance of prompt engineering via few-shot learning of GPT-3.5 and GPT-4 large language models (LLMs). We further attempted to understand when ML models can help automate screening. These ML models were applied to actual datasets of systematic reviews in education. Results showed that the performance of classical and deep ML models varied widely across datasets, ranging from 1.2 to 75.6% of work saved at 95% recall. LLM prompt engineering produced similarly wide performance variation. We searched for various indicators of whether and how ML screening can help. We discovered that the separability of clusters of relevant versus irrelevant articles in high-dimensional embedding space can strongly predict whether ML screening can help (overall R = 0.81). This simple and generalizable heuristic applied well across datasets and different ML model families. In conclusion, ML screening performance varies tremendously, but researchers and software developers can consider using our cluster separability heuristic in various ways in an ML-assisted screening pipeline.
{"title":"Can machine learning help accelerate article screening for systematic reviews? Yes, when article separability in embedding space is high.","authors":"Farhan Ali, Amanda Swee-Ching Tan, Serena Jun-Wei Wang","doi":"10.1017/rsm.2024.16","DOIUrl":"10.1017/rsm.2024.16","url":null,"abstract":"<p><p>Systematic reviews play important roles but manual efforts can be time-consuming given a growing literature. There is a need to use and evaluate automated strategies to accelerate systematic reviews. Here, we comprehensively tested machine learning (ML) models from classical and deep learning model families. We also assessed the performance of prompt engineering via few-shot learning of GPT-3.5 and GPT-4 large language models (LLMs). We further attempted to understand when ML models can help automate screening. These ML models were applied to actual datasets of systematic reviews in education. Results showed that the performance of classical and deep ML models varied widely across datasets, ranging from 1.2 to 75.6% of work saved at 95% recall. LLM prompt engineering produced similarly wide performance variation. We searched for various indicators of whether and how ML screening can help. We discovered that the separability of clusters of relevant versus irrelevant articles in high-dimensional embedding space can strongly predict whether ML screening can help (overall <i>R</i> = 0.81). This simple and generalizable heuristic applied well across datasets and different ML model families. In conclusion, ML screening performance varies tremendously, but researchers and software developers can consider using our cluster separability heuristic in various ways in an ML-assisted screening pipeline.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"194-210"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12621506/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-07DOI: 10.1017/rsm.2024.6
Malgorzata Lagisz, Yefeng Yang, Sarah Young, Shinichi Nakagawa
Systematic searches of published literature are a vital component of systematic reviews. When search strings are not "sensitive," they may miss many relevant studies limiting, or even biasing, the range of evidence available for synthesis. Concerningly, conducting and reporting evaluations (validations) of the sensitivity of the used search strings is rare, according to our survey of published systematic reviews and protocols. Potential reasons may involve a lack of familiarity or inaccessibility of complex sensitivity evaluation approaches. We first clarify the main concepts and principles of search string evaluation. We then present a simple procedure for estimating a relative recall of a search string. It is based on a pre-defined set of "benchmark" publications. The relative recall, that is, the sensitivity of the search string, is the retrieval overlap between the evaluated search string and a search string that captures only the benchmark publications. If there is little overlap (i.e., low recall or sensitivity), the evaluated search string should be improved to ensure that most of the relevant literature can be captured. The presented benchmarking approach can be applied to one or more online databases or search platforms. It is illustrated by five accessible, hands-on tutorials for commonly used online literature sources. Overall, our work provides an assessment of the current state of search string evaluations in published systematic reviews and protocols. It also paves the way to improve evaluation and reporting practices to make evidence synthesis more transparent and robust.
{"title":"A practical guide to evaluating sensitivity of literature search strings for systematic reviews using relative recall.","authors":"Malgorzata Lagisz, Yefeng Yang, Sarah Young, Shinichi Nakagawa","doi":"10.1017/rsm.2024.6","DOIUrl":"10.1017/rsm.2024.6","url":null,"abstract":"<p><p>Systematic searches of published literature are a vital component of systematic reviews. When search strings are not \"sensitive,\" they may miss many relevant studies limiting, or even biasing, the range of evidence available for synthesis. Concerningly, conducting and reporting evaluations (validations) of the sensitivity of the used search strings is rare, according to our survey of published systematic reviews and protocols. Potential reasons may involve a lack of familiarity or inaccessibility of complex sensitivity evaluation approaches. We first clarify the main concepts and principles of search string evaluation. We then present a simple procedure for estimating a relative recall of a search string. It is based on a pre-defined set of \"benchmark\" publications. The relative recall, that is, the sensitivity of the search string, is the retrieval overlap between the evaluated search string and a search string that captures only the benchmark publications. If there is little overlap (i.e., low recall or sensitivity), the evaluated search string should be improved to ensure that most of the relevant literature can be captured. The presented benchmarking approach can be applied to one or more online databases or search platforms. It is illustrated by five accessible, hands-on tutorials for commonly used online literature sources. Overall, our work provides an assessment of the current state of search string evaluations in published systematic reviews and protocols. It also paves the way to improve evaluation and reporting practices to make evidence synthesis more transparent and robust.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"1-14"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12621535/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-07DOI: 10.1017/rsm.2024.15
Darren Rajit, Lan Du, Helena Teede, Joanne Enticott
Bibliographic aggregators like OpenAlex and Semantic Scholar offer scope for automated citation searching within systematic review production, promising increased efficiency. This study aimed to evaluate the performance of automated citation searching compared to standard search strategies and examine factors that influence performance. Automated citation searching was simulated on 27 systematic reviews across the OpenAlex and Semantic Scholar databases, across three study areas (health, environmental management and social policy). Performance, measured by recall (proportion of relevant articles identified), precision (proportion of relevant articles identified from all articles identified), and F1-F3 scores (weighted average of recall and precision), was compared to the performance of search strategies originally employed by each systematic review. The associations between systematic review study area, number of included articles, number of seed articles, seed article type, study type inclusion criteria, API choice, and performance was analyzed. Automated citation searching outperformed the reference standard in terms of precision (p < 0.05) and F1 score (p < 0.05) but failed to outperform in terms of recall (p < 0.05) and F3 score (p < 0.05). Study area influenced the performance of automated citation searching, with performance being higher within the field of environmental management compared to social policy. Automated citation searching is best used as a supplementary search strategy in systematic review production where recall is more important that precision, due to inferior recall and F3 score. However, observed outperformance in terms of F1 score and precision suggests that automated citation searching could be helpful in contexts where precision is as important as recall.
{"title":"Automated citation searching in systematic review production: A simulation study.","authors":"Darren Rajit, Lan Du, Helena Teede, Joanne Enticott","doi":"10.1017/rsm.2024.15","DOIUrl":"10.1017/rsm.2024.15","url":null,"abstract":"<p><p>Bibliographic aggregators like OpenAlex and Semantic Scholar offer scope for automated citation searching within systematic review production, promising increased efficiency. This study aimed to evaluate the performance of automated citation searching compared to standard search strategies and examine factors that influence performance. Automated citation searching was simulated on 27 systematic reviews across the OpenAlex and Semantic Scholar databases, across three study areas (health, environmental management and social policy). Performance, measured by recall (proportion of relevant articles identified), precision (proportion of relevant articles identified from all articles identified), and F1-F3 scores (weighted average of recall and precision), was compared to the performance of search strategies originally employed by each systematic review. The associations between systematic review study area, number of included articles, number of seed articles, seed article type, study type inclusion criteria, API choice, and performance was analyzed. Automated citation searching outperformed the reference standard in terms of precision (p < 0.05) and F1 score (p < 0.05) but failed to outperform in terms of recall (p < 0.05) and F3 score (p < 0.05). Study area influenced the performance of automated citation searching, with performance being higher within the field of environmental management compared to social policy. Automated citation searching is best used as a supplementary search strategy in systematic review production where recall is more important that precision, due to inferior recall and F3 score. However, observed outperformance in terms of F1 score and precision suggests that automated citation searching could be helpful in contexts where precision is as important as recall.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"211-227"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12621532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-10DOI: 10.1017/rsm.2024.13
Rasoul Norouzi, Bennett Kleinberg, Jeroen K Vermunt, Caspar J van Lissa
Understanding causality is crucial for social scientific research to develop strong theories and inform practice. However, explicit discussion of causality is often lacking in social science literature due to ambiguous causal language. This paper introduces a text mining model fine-tuned to extract causal sentences from full-text social science papers. A dataset of 529 causal and 529 non-causal sentences manually annotated from the Cooperation Databank (CoDa) was curated to train and evaluate the model. Several pre-trained language models (BERT, SciBERT, RoBERTa, LLAMA, and Mistral) were fine-tuned on this dataset and general-purpose causality datasets. Model performance was evaluated on held-out social science and general-purpose test sets. Results showed that fine-tuning transformer models on the social science dataset significantly improved causal sentence extraction, even with limited data, compared to the models fine-tuned only on the general-purpose data. Results indicate the importance of domain-specific fine-tuning and data for accurately capturing causal language in academic writing. This automated causal sentence extraction method enables comprehensive, large-scale analysis of causal claims across the social sciences. By systematically cataloging existing causal statements, this work lays the foundation for further research to uncover the mechanisms underlying social phenomena, inform theory development, and strengthen the methodological rigor of the field.
{"title":"Capturing causal claims: A fine-tuned text mining model for extracting causal sentences from social science papers.","authors":"Rasoul Norouzi, Bennett Kleinberg, Jeroen K Vermunt, Caspar J van Lissa","doi":"10.1017/rsm.2024.13","DOIUrl":"10.1017/rsm.2024.13","url":null,"abstract":"<p><p>Understanding causality is crucial for social scientific research to develop strong theories and inform practice. However, explicit discussion of causality is often lacking in social science literature due to ambiguous causal language. This paper introduces a text mining model fine-tuned to extract causal sentences from full-text social science papers. A dataset of 529 causal and 529 non-causal sentences manually annotated from the Cooperation Databank (CoDa) was curated to train and evaluate the model. Several pre-trained language models (BERT, SciBERT, RoBERTa, LLAMA, and Mistral) were fine-tuned on this dataset and general-purpose causality datasets. Model performance was evaluated on held-out social science and general-purpose test sets. Results showed that fine-tuning transformer models on the social science dataset significantly improved causal sentence extraction, even with limited data, compared to the models fine-tuned only on the general-purpose data. Results indicate the importance of domain-specific fine-tuning and data for accurately capturing causal language in academic writing. This automated causal sentence extraction method enables comprehensive, large-scale analysis of causal claims across the social sciences. By systematically cataloging existing causal statements, this work lays the foundation for further research to uncover the mechanisms underlying social phenomena, inform theory development, and strengthen the methodological rigor of the field.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"139-156"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12621500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-13DOI: 10.1017/rsm.2024.10
Zeynep Şiir Bilici, Wim Van den Noortgate, Suzanne Jak
The current meta-analytic structural equation modeling (MASEM) techniques cannot properly deal with cases where there are multiple effect sizes available for the same relationship from the same study. Existing applications either treat these effect sizes as independent, randomly select one effect size amongst many, or create an average effect size. None of these approaches deal with the inherent dependency in effect sizes, and either leads to biased estimates or loss of information and power. An alternative technique is to use univariate three-level modeling in the two-stage approach to model these dependencies. These different strategies for dealing with dependent effect sizes in the context of MASEM have not been previously compared in a simulation study. This study aims to compare the performance of these strategies across different conditions; varying the number of studies, the number of dependent effect sizes within studies, the correlation between the dependent effect sizes, the magnitude of the path coefficient, and the between-studies variance. We examine the relative bias in parameter estimates and standard errors, coverage proportions of confidence intervals, as well as mean standard error and power as measures of efficiency. The results suggest that there is not one method that performs well across all these criteria, pointing to the need for better methods.
{"title":"Six ways to handle dependent effect sizes in meta-analytic structural equation modeling: Is there a gold standard?","authors":"Zeynep Şiir Bilici, Wim Van den Noortgate, Suzanne Jak","doi":"10.1017/rsm.2024.10","DOIUrl":"10.1017/rsm.2024.10","url":null,"abstract":"<p><p>The current meta-analytic structural equation modeling (MASEM) techniques cannot properly deal with cases where there are multiple effect sizes available for the same relationship from the same study. Existing applications either treat these effect sizes as independent, randomly select one effect size amongst many, or create an average effect size. None of these approaches deal with the inherent dependency in effect sizes, and either leads to biased estimates or loss of information and power. An alternative technique is to use univariate three-level modeling in the two-stage approach to model these dependencies. These different strategies for dealing with dependent effect sizes in the context of MASEM have not been previously compared in a simulation study. This study aims to compare the performance of these strategies across different conditions; varying the number of studies, the number of dependent effect sizes within studies, the correlation between the dependent effect sizes, the magnitude of the path coefficient, and the between-studies variance. We examine the relative bias in parameter estimates and standard errors, coverage proportions of confidence intervals, as well as mean standard error and power as measures of efficiency. The results suggest that there is not one method that performs well across all these criteria, pointing to the need for better methods.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"60-86"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12621510/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-04-01DOI: 10.1017/rsm.2024.8
Robin Coatsworth-Puspoky, Wendy Duggleby, Sherry Dahlke, Kathleen F Hunter
Aim(s): To describe a sequential mixed methods review method that prioritized synthesized qualitative evidence from primary studies to explain the complexities of older persons with multiple chronic conditions' unplanned readmission experiences.
Background: Segregated mixed methods review studies frequently prioritize quantitative evidence synthesis to examine the effectiveness of interventions; utilizing qualitative evidence to explain quantitative data. There is a lack of guidance about how to prioritize qualitative evidence.
Results: Five procedural steps were developed to prioritize qualitative evidence synthesis. In Step 1, research questions were developed. In Step 2, databases were searched, studies were mapped to their method (qualitative or quantitative) and appraised. In Step 3, meta-synthesis and applied thematic analysis were used to synthesize extracted qualitative evidence about the psychosocial processes and factors that influenced unplanned readmission. In Step 4, quantitative evidence was synthesized using vote counting to determine the factors influencing unplanned readmission. In Step 5, a matrix was used to compare, determine the agreement between the qualitative and quantitative evidence, juxtapose findings, and uphold validity. Factors were mapped to the model of psychosocial processes and analytic themes.
Conclusion: Prioritizing qualitative evidence synthesis in a mixed methods review study prioritizes participants' experiences, perspectives, and voices to understand complex clinical problems from participants who experienced the event. Synthesizing and integrating evidence facilitates the construction of holistic new understandings about phenomenon and expands mixed methods systematic review methods.
Implications: Prioritizing patients' perspectives is useful for developing new client-centered interventions, establishing best practices for future reviews, generating theories, and expanding research methods.
{"title":"Prioritizing qualitative meta-synthesis findings in a mixed methods systematic review study: A description of the method.","authors":"Robin Coatsworth-Puspoky, Wendy Duggleby, Sherry Dahlke, Kathleen F Hunter","doi":"10.1017/rsm.2024.8","DOIUrl":"10.1017/rsm.2024.8","url":null,"abstract":"<p><strong>Aim(s): </strong>To describe a sequential mixed methods review method that prioritized synthesized qualitative evidence from primary studies to explain the complexities of older persons with multiple chronic conditions' unplanned readmission experiences.</p><p><strong>Background: </strong>Segregated mixed methods review studies frequently prioritize quantitative evidence synthesis to examine the effectiveness of interventions; utilizing qualitative evidence to explain quantitative data. There is a lack of guidance about how to prioritize qualitative evidence.</p><p><strong>Results: </strong>Five procedural steps were developed to prioritize qualitative evidence synthesis. In Step 1, research questions were developed. In Step 2, databases were searched, studies were mapped to their method (qualitative or quantitative) and appraised. In Step 3, meta-synthesis and applied thematic analysis were used to synthesize extracted qualitative evidence about the psychosocial processes and factors that influenced unplanned readmission. In Step 4, quantitative evidence was synthesized using vote counting to determine the factors influencing unplanned readmission. In Step 5, a matrix was used to compare, determine the agreement between the qualitative and quantitative evidence, juxtapose findings, and uphold validity. Factors were mapped to the model of psychosocial processes and analytic themes.</p><p><strong>Conclusion: </strong>Prioritizing qualitative evidence synthesis in a mixed methods review study prioritizes participants' experiences, perspectives, and voices to understand complex clinical problems from participants who experienced the event. Synthesizing and integrating evidence facilitates the construction of holistic new understandings about phenomenon and expands mixed methods systematic review methods.</p><p><strong>Implications: </strong>Prioritizing patients' perspectives is useful for developing new client-centered interventions, establishing best practices for future reviews, generating theories, and expanding research methods.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"123-138"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12631148/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-04-01DOI: 10.1017/rsm.2024.5
T D Stanley, Hristos Doucouliagos, Tomas Havranek
Conventional meta-analyses (both fixed and random effects) of correlations are biased due to the mechanical relationship between the estimated correlation and its standard error. Simulations that are closely calibrated to match actual research conditions widely seen across correlational studies in psychology corroborate these biases and suggest two solutions: UWLS+3 and HS. UWLS+3 is a simple inverse-variance weighted average (the unrestricted weighted least squares) that adjusts the degrees of freedom and thereby reduces small-sample bias to scientific negligibility. UWLS+3 as well as the Hunter and Schmidt approach (HS) are less biased than conventional random-effects estimates of correlations and Fisher's z, whether or not there is publication selection bias. However, publication selection bias remains a ubiquitous source of bias and false-positive findings. Despite the relationship between the estimated correlation and its standard error in the absence of selective reporting, the precision-effect test/precision-effect estimate with standard error (PET-PEESE) nearly eradicates publication selection bias. Surprisingly, PET-PEESE keeps the rate of false positives (i.e., type I errors) within their nominal levels under the typical conditions widely seen across psychological research whether there is publication selection bias, or not.
由于估计的相关性和标准误差之间的机械关系,传统的相关性元分析(固定效应和随机效应)是有偏差的。在心理学相关研究中广泛看到的与实际研究条件相匹配的模拟结果证实了这些偏见,并提出了两种解决方案:UWLS+3和HS。UWLS+3是一个简单的反方差加权平均值(不受限制的加权最小二乘),它调整了自由度,从而将小样本偏差降低到科学的可忽略性。无论是否存在发表选择偏倚,UWLS+3以及Hunter and Schmidt方法(HS)的偏倚都小于传统的随机效应相关性估计和Fisher’s z。然而,出版物选择偏倚仍然是普遍存在的偏倚和假阳性结果的来源。尽管在没有选择性报告的情况下,估计的相关性与其标准误差之间存在关系,但精度效应检验/标准误差精度效应估计(PET-PEESE)几乎消除了发表选择偏倚。令人惊讶的是,PET-PEESE将假阳性率(即I型错误)保持在其名义水平内,无论是否存在出版物选择偏倚,在心理学研究中普遍存在的典型条件下。
{"title":"Reducing the biases of the conventional meta-analysis of correlations.","authors":"T D Stanley, Hristos Doucouliagos, Tomas Havranek","doi":"10.1017/rsm.2024.5","DOIUrl":"10.1017/rsm.2024.5","url":null,"abstract":"<p><p>Conventional meta-analyses (both fixed and random effects) of correlations are biased due to the mechanical relationship between the estimated correlation and its standard error. Simulations that are closely calibrated to match actual research conditions widely seen across correlational studies in psychology corroborate these biases and suggest two solutions: UWLS<sub>+3</sub> and HS. UWLS<sub>+3</sub> is a simple inverse-variance weighted average (the unrestricted weighted least squares) that adjusts the degrees of freedom and thereby reduces small-sample bias to scientific negligibility. UWLS<sub>+3</sub> as well as the Hunter and Schmidt approach (HS) are less biased than conventional random-effects estimates of correlations and Fisher's <i>z</i>, whether or not there is publication selection bias. However, publication selection bias remains a ubiquitous source of bias and false-positive findings. Despite the relationship between the estimated correlation and its standard error in the absence of selective reporting, the precision-effect test/precision-effect estimate with standard error (PET-PEESE) nearly eradicates publication selection bias. Surprisingly, PET-PEESE keeps the rate of false positives (i.e., type I errors) within their nominal levels under the typical conditions widely seen across psychological research whether there is publication selection bias, or not.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"16 1","pages":"42-59"},"PeriodicalIF":6.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12631149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}