Highlights: The net value of reducing decision uncertainty by collecting additional data is quantified by the expected net benefit of sampling (ENBS). This tutorial presents a general-purpose algorithm for computing the ENBS for collecting survival data along with a step-by-step implementation in R.The algorithm is based on recently published methods for simulating survival data and computing expected value of sample information that do not rely on the survival data to follow any particular parametric distribution and that can take into account any arbitrary censoring process.We demonstrate in a case study based on a previous cancer technology appraisal that ENBS calculations are useful not only for designing new studies but also for optimizing reimbursement decisions for new health technologies based on immature evidence from ongoing trials.
Background: Infectious disease (ID) models have been the backbone of policy decisions during the COVID-19 pandemic. However, models often overlook variation in disease risk, health burden, and policy impact across social groups. Nonetheless, social determinants are becoming increasingly recognized as fundamental to the success of control strategies overall and to the mitigation of disparities.
Methods: To underscore the importance of considering social heterogeneity in epidemiological modeling, we systematically reviewed ID modeling guidelines to identify reasons and recommendations for incorporating social determinants of health into models in relation to the conceptualization, implementation, and interpretations of models.
Results: After identifying 1,372 citations, we found 19 guidelines, of which 14 directly referenced at least 1 social determinant. Age (n = 11), sex and gender (n = 5), and socioeconomic status (n = 5) were the most commonly discussed social determinants. Specific recommendations were identified to consider social determinants to 1) improve the predictive accuracy of models, 2) understand heterogeneity of disease burden and policy impact, 3) contextualize decision making, 4) address inequalities, and 5) assess implementation challenges.
Conclusion: This study can support modelers and policy makers in taking into account social heterogeneity, to consider the distributional impact of infectious disease outbreaks across social groups as well as to tailor approaches to improve equitable access to prevention, diagnostics, and therapeutics.
Highlights: Infectious disease (ID) models often overlook the role of social determinants of health (SDH) in understanding variation in disease risk, health burden, and policy impact across social groups.In this study, we systematically review ID guidelines and identify key areas to consider SDH in relation to the conceptualization, implementation, and interpretations of models.We identify specific recommendations to consider SDH to improve model accuracy, understand heterogeneity, estimate policy impact, address inequalities, and assess implementation challenges.
Background: Noninvasive prenatal testing (NIPT) was developed to improve the accuracy of prenatal screening to detect chromosomal abnormalities. Published economic analyses have yielded different incremental cost-effective ratios (ICERs), leading to conclusions of NIPT being dominant, cost-effective, and cost-ineffective. These analyses have used different model structures, and the extent to which these structural variations have contributed to differences in ICERs is unclear.
Aim: To assess the impact of different model structures on the cost-effectiveness of NIPT for the detection of trisomy 21 (T21; Down syndrome).
Methods: A systematic review identified economic models comparing NIPT to conventional screening. The key variations in identified model structures were the number of health states and modeling approach. New models with different structures were developed in TreeAge and populated with consistent parameters to enable a comparison of the impact of selected structural variations on results.
Results: The review identified 34 economic models. Based on these findings, demonstration models were developed: 1) a decision tree with 3 health states, 2) a decision tree with 5 health states, 3) a microsimulation with 3 health states, and 4) a microsimulation with 5 health states. The base-case ICER from each model was 1) USD$34,474 (2023)/quality-adjusted life-year (QALY), 2) USD$14,990 (2023)/QALY, (3) USD$54,983 (2023)/QALY, and (4) NIPT was dominated.
Conclusion: Model-structuring choices can have a large impact on the ICER and conclusions regarding cost-effectiveness, which may inadvertently affect policy decisions to support or not support funding for NIPT. The use of reference models could improve international consistency in health policy decision making for prenatal screening.
Highlights: NIPT is a clinical area in which a variety of modeling approaches have been published, with wide variation in reported cost-effectiveness.This study shows that when broader contextual factors are held constant, varying the model structure yields results that range from NIPT being less effective and more expensive than conventional screening (i.e., NIPT was dominated) through to NIPT being more effective and more expensive than conventional screening with an ICER of USD$54,983 (2023)/QALY.Model-structuring choices may inadvertently affect policy decisions to support or not support funding of NIPT. Reference models could improve international consistency in health policy decision making for prenatal screening.
Purpose: To develop a model that simulates radiologist assessments and use it to explore whether pairing readers based on their individual performance characteristics could optimize screening performance.
Methods: Logistic regression models were designed and used to model individual radiologist assessments. For model evaluation, model-predicted individual performance metrics and paired disagreement rates were compared against the observed data using Pearson correlation coefficients. The logistic regression models were subsequently used to simulate different screening programs with reader pairing based on individual true-positive rates (TPR) and/or false-positive rates (FPR). For this, retrospective results from breast cancer screening programs employing double reading in Sweden, England, and Norway were used. Outcomes of random pairing were compared against those composed of readers with similar and opposite TPRs/FPRs, with positive assessments defined by either reader flagging an examination as abnormal.
Results: The analysis data sets consisted of 936,621 (Sweden), 435,281 (England), and 1,820,053 (Norway) examinations. There was good agreement between the model-predicted and observed radiologists' TPR and FPR (r ≥ 0.969). Model-predicted negative-case disagreement rates showed high correlations (r ≥ 0.709), whereas positive-case disagreement rates had lower correlation levels due to sparse data (r ≥ 0.532). Pairing radiologists with similar FPR characteristics (Sweden: 4.50% [95% confidence interval: 4.46%-4.54%], England: 5.51% [5.47%-5.56%], Norway: 8.03% [7.99%-8.07%]) resulted in significantly lower FPR than with random pairing (Sweden: 4.74% [4.70%-4.78%], England: 5.76% [5.71%-5.80%], Norway: 8.30% [8.26%-8.34%]), reducing examinations sent to consensus/arbitration while the TPR did not change significantly. Other pairing strategies resulted in equal or worse performance than random pairing.
Conclusions: Logistic regression models accurately predicted screening mammography assessments and helped explore different radiologist pairing strategies. Pairing readers with similar modeled FPR characteristics reduced the number of examinations unnecessarily sent to consensus/arbitration without significantly compromising the TPR.
Highlights: A logistic-regression model can be derived that accurately predicts individual and paired reader performance during mammography screening reading.Pairing screening mammography radiologists with similar false-positive characteristics reduced false-positive rates with no significant loss in true positives and may reduce the number of examinations unnecessarily sent to consensus/arbitration.
Background: The expected value of sample information (EVSI) measures the expected benefits that could be obtained by collecting additional data. Estimating EVSI using the traditional nested Monte Carlo method is computationally expensive, but the recently developed Gaussian approximation (GA) approach can efficiently estimate EVSI across different sample sizes. However, the conventional GA may result in biased EVSI estimates if the decision models are highly nonlinear. This bias may lead to suboptimal study designs when GA is used to optimize the value of different studies. Therefore, we extend the conventional GA approach to improve its performance for nonlinear decision models.
Methods: Our method provides accurate EVSI estimates by approximating the conditional expectation of the benefit based on 2 steps. First, a Taylor series approximation is applied to estimate the conditional expectation of the benefit as a function of the conditional moments of the parameters of interest using a spline, which is fitted to the samples of the parameters and the corresponding benefits. Next, the conditional moments of parameters are approximated by the conventional GA and Fisher information. The proposed approach is applied to several data collection exercises involving non-Gaussian parameters and nonlinear decision models. Its performance is compared with the nested Monte Carlo method, the conventional GA approach, and the nonparametric regression-based method for EVSI calculation.
Results: The proposed approach provides accurate EVSI estimates across different sample sizes when the parameters of interest are non-Gaussian and the decision models are nonlinear. The computational cost of the proposed method is similar to that of other novel methods.
Conclusions: The proposed approach can estimate EVSI across sample sizes accurately and efficiently, which may support researchers in determining an economically optimal study design using EVSI.
Highlights: The Gaussian approximation method efficiently estimates the expected value of sample information (EVSI) for clinical trials with varying sample sizes, but it may introduce bias when health economic models have a nonlinear structure.We introduce the spline-based Taylor series approximation method and combine it with the original Gaussian approximation to correct the nonlinearity-induced bias in EVSI estimation.Our approach can provide more precise EVSI estimates for complex decision models without sacrificing computational efficiency, which can enhance the resource allocation strategies from the cost-effective perspective.
Background: Recent developments in causal inference and machine learning (ML) allow for the estimation of individualized treatment effects (ITEs), which reveal whether treatment effectiveness varies according to patients' observed covariates. ITEs can be used to stratify health policy decisions according to individual characteristics and potentially achieve greater population health. Little is known about the appropriateness of available ML methods for use in health technology assessment.
Methods: In this scoping review, we evaluate ML methods available for estimating ITEs, aiming to help practitioners assess their suitability in health technology assessment. We present a taxonomy of ML approaches, categorized by key challenges in health technology assessment using observational data, including handling time-varying confounding and time-to event data and quantifying uncertainty.
Results: We found a wide range of algorithms for simpler settings with baseline confounding and continuous or binary outcomes. Not many ML algorithms can handle time-varying or unobserved confounding, and at the time of writing, no ML algorithm was capable of estimating ITEs for time-to-event outcomes while accounting for time-varying confounding. Many of the ML algorithms that estimate ITEs in longitudinal settings do not formally quantify uncertainty around the point estimates.
Limitations: This scoping review may not cover all relevant ML methods and algorithms as they are continuously evolving.
Conclusions: Existing ML methods available for ITE estimation are limited in handling important challenges posed by observational data when used for cost-effectiveness analysis, such as time-to-event outcomes, time-varying and hidden confounding, or the need to estimate sampling uncertainty around the estimates.
Implications: ML methods are promising but need further development before they can be used to estimate ITEs for health technology assessments.
Highlights: Estimating individualized treatment effects (ITEs) using observational data and machine learning (ML) can support personalized treatment advice and help deliver more customized information on the effectiveness and cost-effectiveness of health technologies.ML methods for ITE estimation are mostly designed for handling confounding at baseline but not time-varying or unobserved confounding. The few models that account for time-varying confounding are designed for continuous or binary outcomes, not time-to-event outcomes.Not all ML methods for estimating ITEs can quantify the uncertainty of their predictions.Future work on developing ML that addresses the concerns summarized in this review is needed before these methods can be widely used in clinical and health technology assessment-like decision making.
Purpose: Decision models are time-consuming to develop; therefore, adapting previously developed models for new purposes may be advantageous. We provide methods to prioritize efforts to 1) update parameter values in existing models and 2) adapt existing models for distributional cost-effectiveness analysis (DCEA).
Methods: Methods exist to assess the influence of different input parameters on the results of a decision models, including value of information (VOI) and 1-way sensitivity analysis (OWSA). We apply 1) VOI to prioritize searches for additional information to update parameter values and 2) OWSA to prioritize searches for parameters that may vary by socioeconomic characteristics. We highlight the assumptions required and propose metrics that quantify the extent to which parameters in a model have been updated or adapted. We provide R code to quickly carry out the analysis given inputs from a probabilistic sensitivity analysis (PSA) and demonstrate our methods using an oncology case study.
Results: In our case study, updating 2 of 21 probabilistic model parameters addressed 71.5% of the total VOI and updating 3 addressed approximately 100% of the uncertainty. Our proposed approach suggests that these are the 3 parameters that should be prioritized. For model adaptation for DCEA, 46.3% of the total OWSA variation came from a single parameter, while the top 10 input parameters were found to account for more than 95% of the total variation, suggesting efforts should be aimed toward these.
Conclusions: These methods offer a systematic approach to guide research efforts in updating models with new data or adapting models to undertake DCEA. The case study demonstrated only very small gains from updating more than 3 parameters or adapting more than 10 parameters.
Highlights: It can require considerable analyst time to search for evidence to update a model or to adapt a model to take account of equity concerns.In this article, we provide a quantitative method to prioritze parameters to 1) update existing models to reflect potential new evidence and 2) adapt existing models to estimate distributional outcomes.We define metrics that quantify the extent to which the parameters in a model have been updated or adapted.We provide R code that can quickly rank parameter importance and calculate quality metrics using only the results of a standard probabilistic sensitivity analysis.