Purpose: Individual-level state-transition microsimulations (iSTMs) have proliferated for economic evaluations in place of cohort state transition models (cSTMs). Probabilistic economic evaluations quantify decision uncertainty and value of information (VOI). Previous studies show that iSTMs provide unbiased estimates of expected incremental net monetary benefits (EINMB), but statistical properties of iSTM-produced estimates of decision uncertainty and VOI remain uncharacterized.
Methods: We compare iSTM-produced estimates of decision uncertainty and VOI to corresponding cSTMs. For a 2-alternative decision and normally distributed incremental costs and benefits, we derive analytical expressions for the probability of being cost-effective and the expected value of perfect information (EVPI) for cSTMs and iSTMs, accounting for correlations in incremental outcomes at the population and individual levels. We use numerical simulations to illustrate our findings and explore the impact of relaxing normality assumptions or having >2 decision alternatives.
Results: iSTM estimates of decision uncertainty and VOI are biased but asymptotically consistent (i.e., bias approaches 0 as number of microsimulated individuals approaches infinity). Decision uncertainty depends on 1 tail of the INMB distribution (e.g., P[INMB <0]), which depends on estimated variance (larger with iSTMs given first-order noise). While iSTMs overestimate EVPI, their direction of bias for the probability of being cost-effective is ambiguous. Bias is larger when uncertainties in incremental costs and effects are negatively correlated since this increases INMB variance.
Conclusions: iSTMs are useful for probabilistic economic evaluations. While more samples at the population uncertainty level are interchangeable with more microsimulations for estimating EINMB, minimizing iSTM bias in estimating decision uncertainty and VOI depends on sufficient microsimulations. Analysts should account for this when allocating their computational budgets and, at minimum, characterize such bias in their reported results.
Highlights: Individual-level state-transition microsimulation models (iSTMs) produce biased but consistent estimates of the probability that interventions are cost-effective.iSTMs also produce biased but consistent estimates of the expected value of perfect information.The biases in these decision uncertainty and value-of-information measures are not reduced by more parameter sets being sampled from their population-level uncertainty distribution but rather by more individuals being microsimulated for each parameter set sampled.Analysts using iSTMs to quantify decision uncertainty and value of information should account for these biases when allocating their computational budgets and, at minimum, characterize such bias in their reported results.
Highlights: Our commentary proposes the application of directed acyclic graphs (DAGs) in the design of decision-analytic models, offering researchers a valuable and structured tool to enhance transparency and accuracy by bridging the gap between causal inference and model design in medical decision making.The practical examples in this article showcase the transformative effect DAGs can have on model structure, parameter selection, and the resulting conclusions on effectiveness and cost-effectiveness.This methodological article invites a broader conversation on decision-modeling choices grounded in causal assumptions.
Background: Mathematical models served a critical role in COVID-19 decision making throughout the pandemic. Model calibration is an essential, but often computationally burdensome, step in model development that provides estimates for difficult-to-measure parameters and establishes an up-to-date modeling platform for scenario analysis. In the evolving COVID-19 pandemic, frequent recalibration was necessary to provide ongoing support to decision makers. In this study, we address the computational challenges of frequent recalibration with a new calibration approach.
Methods: We calibrated and recalibrated an age-stratified dynamic compartmental model of COVID-19 in Minnesota to statewide COVID-19 cumulative mortality and prevalent age-specific hospitalizations from March 22, 2020 through August 20, 2021. This period was divided into 10 calibration periods, reflecting significant changes in policies, messaging, and/or epidemiological conditions in Minnesota. When recalibrating the model from one period to the next, we employed a sequential calibration approach that leveraged calibration results from previous periods and adjusted only parameters most relevant to the calibration target data of the new calibration period to improve computational efficiency. We compared computational burden and performance of the sequential calibration approach to a more traditional calibration method, in which all parameters were readjusted with each recalibration.
Results: Both calibration methods identified parameter sets closely reproducing prevalent hospitalizations and cumulative deaths over time. By the last calibration period, both approaches converged to similar parameter values. However, the sequential calibration approach identified parameter sets that more tightly fit calibration targets and required substantially less computation time than traditional calibration.
Conclusions: Sequential calibration is an efficient approach to maintaining up-to-date models with evolving, time-varying parameters and potentially identifies better-fitting parameter sets than traditional calibration.
Highlights: This study used a sequential calibration approach, which takes advantage of previous calibration results to reduce the number of parameters to be estimated in each round of calibration, improving computational efficiency and algorithm convergence to best-fitting parameter values.Both sequential and traditional calibration approaches were able to identify parameter sets that closely reproduced calibration targets. However, the sequential calibration approach generated parameter sets that yielded tighter fits and was less computationally burdensome.Sequential calibration is an efficient approach to maintaining up-to-date models with evolving, time-varying parameters.
Background: With advancing illness, some patients with heart failure (HF) opt to receive life-extending treatments despite their high costs, while others choose to forgo these treatments, emphasizing cost containment. We examined the association between patients' health status and their preferences for treatment cost containment versus life extension and whether their patients' awareness of disease incurability moderated this association.
Methods: In a prospective cohort of patients (N = 231) with advanced HF in Singapore, we assessed patients' awareness of disease incurability, health status, and treatment preferences every 4 mo for up to 4 y (up to 13 surveys). Using random effects multinomial logistic regression models, we assessed whether patients' awareness of disease incurability moderated the association between their health status and treatment preferences.
Results: About half of the patients in our study lacked awareness of HF's incurability. Results from regression analyses showed that patients with better health status, as indicated by lower distress scores (odds ratio [OR] [95% confidence interval {CI}]: 0.862 [0.754, 0.985]) and greater physical well-being (1.12 [1.03, 1.21]); and who lacked awareness of their disease's incurability were more likely to prefer higher cost containment/minimal life extension treatments compared with lower cost containment/maximal life extension.
Conclusions: This study underscores the significance of patients' awareness in disease incurability in shaping the relationship between their health status and treatment preferences. Our findings emphasize the need to incorporate illness education during goals-of-care conversations with patients and the importance of revisiting these conversations frequently to accommodate changing treatment preferences.
Highlights: The health status of patients with advanced heart failure was associated with their treatment preferences.Patients whose health status improved and who lacked awareness of their disease's incurability were more likely to prefer higher cost containment/minimal life extension treatments.
Background: Clinical uncertainty is associated with increased resource utilization, worsened health-related quality of life for patients, and provider burnout, particularly during critical illness. Existing data are limited, because determining uncertainty from notes typically requires manual, qualitative review. We sought to develop a consensus list of descriptors of clinical uncertainty and then, using a thematic analysis approach, describe how respondents consider their use in intensive care unit (ICU) notes, such that future work can extract uncertainty data at scale.
Design: We conducted a Delphi consensus study with physicians across multiple institutions nationally who care for critically ill patients or patients with advanced illnesses. Participants were given a definition for clinical uncertainty and collaborated through multiple rounds to determine which words represent uncertainty in clinician notes. We also administered surveys that included open-ended questions to participants about clinical uncertainty. Following derivation of a consensus list, we analyzed participant responses using thematic analysis to understand the role of uncertainty in clinical documentation.
Results: Nineteen physicians participated in at least 2 of the Delphi rounds. Consensus was achieved for 44 words or phrases over 5 rounds of the Delphi process. Clinicians described comfort with using uncertainty terms and used them in a variety of ways: documenting and processing the diagnostic thinking process, enlisting help, identifying incomplete information, and practicing transparency to reflect uncertainty that was present.
Conclusions: Using a consensus process, we created an uncertainty lexicon that can be used for uncertainty data extraction from the medical record. We demonstrate that physicians, particularly in the ICU, are comfortable with uncertainty and document uncertainty terms frequently to convey the complexity and ambiguity that is pervasive in critical illness.
Highlights: Question: What words do physicians caring for critically ill patients use to document clinical uncertainty, and why?Findings: A consensus list of 44 words or phrases was identified by a group of experts. Physicians expressed comfort with using these words in the electronic health record.Meaning: Physicians are comfortable with uncertainty words and document them frequently to convey the complexity and ambiguity that is pervasive in critical illness.
Objectives: (1) To demonstrate the use of quality-adjusted life-years (QALYs) as an outcome measure for comparing performance between simulation models and identifying the most accurate model for economic evaluation and health technology assessment. QALYs relate directly to decision making and combine mortality and diverse clinical events into a single measure using evidence-based weights that reflect population preferences. (2) To explore the usefulness of Q2, the proportional reduction in error, as a model performance metric and compare it with other metrics: mean squared error (MSE), mean absolute error, bias (mean residual), and R2.
Methods: We simulated all EXSCEL trial participants (N = 14,729) using the UK Prospective Diabetes Study Outcomes Model software versions 1 (UKPDS-OM1) and 2 (UKPDS-OM2). The EXSCEL trial compared once-weekly exenatide with placebo (median 3.2-y follow-up). Default UKPDS-OM2 utilities were used to estimate undiscounted QALYs over the trial period based on the observed events and survival. These were compared with the QALYs predicted by UKPDS-OM1/2 for the same period.
Results: UKPDS-OM2 predicted patients' QALYs more accurately than UKPDS-OM1 did (MSE: 0.210 v. 0.253; Q2: 0.822 v. 0.786). UKPDS-OM2 underestimated QALYs by an average of 0.127 versus 0.150 for UKPDS-OM1. UKPDS-OM2 predictions were more accurate for mortality, myocardial infarction, and stroke, whereas UKPDS-OM1 better predicted blindness and heart disease. Q2 facilitated comparisons between subgroups and (unlike R2) was lower for biased predictors.
Conclusions: Q2 for QALYs was useful for comparing global prediction accuracy (across all clinical events) of diabetes models. It could be used for model registries, choosing between simulation models for economic evaluation and evaluating the impact of recalibration. Similar methods could be used in other disease areas.
Highlights: Diabetes simulation models are currently validated by examining their ability to predict the incidence of individual events (e.g., myocardial infarction, stroke, amputation) or composite events (e.g., first major adverse cardiovascular event).We introduce Q2, the proportional reduction in error, as a measure that may be useful for evaluating and comparing the prediction accuracy of econometric or simulation models.We propose using the Q2 or mean squared error for QALYs as global measures of model prediction accuracy when comparing diabetes models' performance for health technology assessment; these can be used to select the most accurate simulation model for economic evaluation and to evaluate the impact of model recalibration in diabetes or other conditions.
Objectives: To examine awareness of disease incurability among patients with heart failure over 24 mo and its associations with patient characteristics and patient-reported outcomes (distress, emotional, and spiritual well-being).
Methods: This study analyzed 24-mo data from a prospective cohort study of 251 patients with heart failure (New York Heart Association class III/IV) recruited from inpatient wards in Singapore General Hospital and National Heart Centre Singapore. Patients were asked to report if their doctor told them they were receiving treatment to cure their condition. "No" responses were categorized as being aware of disease incurability, while "Yes" and "Uncertain" were categorized as being unaware and being uncertain about disease incurability, respectively. We used mixed-effects multinomial logistic regression to investigate the associations between awareness of disease incurability and patient characteristics and mixed-effects linear regressions to investigate associations with patient outcomes.
Results: The percentage of patients who were aware of disease incurability increased from 51.6% at baseline to 76.4% at 24-mo follow-up (P < 0.001). Compared with being unaware of disease incurability, being aware was associated with older age (relative risk ratio [RRR] = 1.04; P = 0.005), adequate self-care confidence (RRR = 5.06; P < 0.001), participation in treatment decision making (RRR = 2.13; P = 0.006), higher education (RRR = 2.00; P = 0.033), financial difficulty (RRR = 1.18; P = 0.020), symptom burden (RRR = 1.08; P = 0.001), and ethnicity (P < 0.05). Compared with being unaware of disease incurability, being aware was associated with higher emotional well-being (β = 0.76; P = 0.024), while being uncertain about disease incurability was associated with poorer spiritual well-being (β = -3.16; P = 0.006).
Conclusions: Our findings support the importance of being aware of disease incurability, addressing uncertainty around disease incurability among patients with heart failure, and helping patients make informed medical decisions. The findings are important to Asian and other cultures where the prognosis disclosure to terminally ill patients is generally low with an intention to "protect" patients.
Highlights: Our 24-mo study with heart failure patients showed an increase from 52% to 76% in patients being aware of disease incurability.Compared with being unaware of disease incurability, being aware was associated with higher emotional well-being, while uncertainty about disease incurability was associated with poorer spiritual well-being.
Purpose: We aim to assess the performance of methods for adjusting estimates of treatment effectiveness for patient nonadherence in the context of health technology assessment using simulation methods.
Methods: We simulated trial datasets with nonadherence, prognostic characteristics, and a time-to-event outcome. The simulated scenarios were based on a trial investigating immunosuppressive treatments for improving graft survival in patients who had had a kidney transplant. The primary estimand was the difference in restricted mean survival times in all patients had there been no nonadherence. We compared generalized methods (g-methods; marginal structural model with inverse probability of censoring weighting [IPCW], structural nested failure time model [SNFTM] with g-estimation) and simple methods (intention-to-treat [ITT] analysis, per-protocol [PP] analysis) in 90 scenarios each with 1,900 simulations. The methods' performance was primarily assessed according to bias.
Results: In implementation nonadherence scenarios, the average percentage bias was 20% (ranging from 7% to 37%) for IPCW, 20% (8%-38%) for SNFTM, 20% (8%-38%) for PP, and 40% (20%-75%) for ITT. In persistence nonadherence scenarios, the average percentage bias was 26% (9%-36%) for IPCW, 26% (14%-39%) for SNFTM, 26% (14%-36%) for PP, and 47% (16%-72%) for ITT. In initiation nonadherence scenarios, the percentage bias ranged from -29% to 110% for IPCW, -34% to 108% for SNFTM, -32% to 102% for PP, and between -18% and 200% for ITT.
Conclusion: In this study, g-methods and PP produced more accurate estimates of the treatment effect adjusted for nonadherence than the ITT analysis did. However, considerable bias remained in some scenarios.
Highlights: Randomized controlled trials are usually analyzed using the intention-to-treat (ITT) principle, which produces a valid estimate of effectiveness relating to the underlying trial, but when patient adherence to medications in the real world is known to differ from that observed in the trial, such estimates are likely to result in a biased representation of real-world effectiveness and cost-effectiveness.Our simulation study demonstrates that generalized methods (g-methods; IPCW, SNFTM) and per-protocol analysis provide more accurate estimates of the treatment effect than the ITT analysis does, when adjustment for nonadherence is required; however, even with these adjustment methods, considerable bias may remain in some scenarios.When real-world adherence is expected to differ from adherence observed in a trial, adjustment methods should be used to provide estimates of real-world effectiveness.