在质量改进和卫生系统研究中应用差异设计。

IF 4.5 2区医学 Q1 GERIATRICS & GERONTOLOGY Journal of the American Geriatrics Society Pub Date : 2024-09-06 DOI:10.1111/jgs.19180

Yucheng Hou PhD, MPP, Abdelaziz Alsharawy PhD

{"title":"在质量改进和卫生系统研究中应用差异设计。","authors":"Yucheng Hou PhD, MPP, Abdelaziz Alsharawy PhD","doi":"10.1111/jgs.19180","DOIUrl":null,"url":null,"abstract":"Assessing the effectiveness of a health system intervention when randomized controlled trials (RCTs) are infeasible has long been a challenge for clinicians, health economists, and health service researchers alike. Difference-in-differences (DID) is a quasi-experimental study design that can be particularly appealing in addressing this challenge using observational data. Other nonexperimental study designs, such as regression adjustment or propensity score matching, attempt to examine the impact of an intervention by only accounting for the observed differences between groups. In contrast, an appropriately designed DID study aims to exploit randomness in intervention timing to identify the causal effects of the intervention. The number of published papers applying DID designs in the medical field has been increasing in recent years.1, 2 Following this trend, the Journal of the American Geriatrics Society (JAGS) published 18 studies, mostly since 2018 (original data from the authors), that apply DID designs to examine a wide range of health system interventions that pertain to geriatrics care (Figure 1).DID designs assess the effect of an intervention (e.g., health policy or program) applied to one or more groups (treated) by comparing their outcomes relative to a group that has never or not yet received the intervention (control) in terms of two differences.3, 4 The first set of differences compares outcomes before and after the timing of the intervention for the treated and control groups, respectively. This process removes the observed and unobserved group-specific factors that do not change over time. Subtracting these differences (i.e., the second difference or difference-in-differences) removes the time-varying trends that are common to both groups. Together, DID identifies the causal effect of the intervention assuming that the treated would have experienced the same trend as the control group in the absence of the intervention (parallel trends).In the recent issue of JAGS, a study by Burke and colleagues5 used a DID design to evaluate changes in patient care outcomes following the Age-Friendly health systems recognition in the Veterans Health Administration. The authors incorporated recent advances in DID with staggered treatment timing developed by Sun and Abraham,6 which is appropriate as the receipt of recognition across the medical sites happened at different times. This approach addresses potential biases in traditional DID estimation—often referred to as two-way fixed effects—when treatment effects are not constant over time and differ by late versus early treated sites.6-8 Given the absence of an RCT in this setting, one of the notable strengths of this study stems from using observational data to measure the effect of recognition for implementing evidence-based care transformations (4Ms: what Matters, Medication, Mentation, and Mobility) in geriatric care outcomes. While the findings clearly describe a positive association between Age-Friendly recognition and facility-free days, the readers are met with a typical conundrum when DID designs are adopted: Can we interpret these relationships as causal effects?Answering this question requires making an explicit argument for the plausibility of core DID assumptions—both statistically and conceptually. Typically, studies using DID devote substantial attention to argue for the validity of the parallel trend assumption by demonstrating similar trends in outcomes between treated and control prior to the intervention (hereinafter, pre-trend tests). If a healthcare outcome prior to the intervention (e.g., number of facility-free days) was already increasing (or decreasing) for the treated relative to the control, then observed differences post the intervention may merely be the continuation of the pre-trend—not the treatment effect. Although most publications applying DID in JAGS discussed or visually assessed parallel pre-trends, only a handful reported statistical tests (Figure 1). Visual inspections, however intuitive, may mask differential trends leading up to the intervention or may be too noisy to provide a compelling critique of the pre-trend. Statistical tests that are adequately powered to detect differences in pre-trends between treated and control groups would be more transparent; recent literature has been focusing on diagnostics of power of pre-trend testing.9-11 If, however, nonparallel pre-trends are evident, adjusting for or matching on time-invariant observed characteristics measured at baseline that are associated with treatment status and the outcome trends may be justified.12Nonetheless, even if the parallel trend appears to be satisfied prior to the intervention, does this criterion warrant that a DID design is readily geared to identify the causal impact of an intervention? Not necessarily. First, we want to make a distinction between parallel trend assumption and pre-trend tests.10 This distinction is important to highlight because the parallel trend assumption involves a counterfactual concept that is inherently untestable: What would have happened had the treated group not received the treatment? Pre-trend tests, if passed, do lend credibility to a DID design. Yet, a more critical question emerges when assessing the overall validity of the parallel trend assumption: What time-varying unobserved confounding factors may have resulted in or coincide with the intervention taking place? This question fundamentally pertains to the nature and timing of the intervention. Indeed, this core aspect of the DID assumption is more subtle and is prone to fail in many practical applications.1 Beyond pre-trend tests, conceptual discussions with context-specific examinations are necessary for establishing the rationale for causal inference using DID designs.10 We next focus on three potential sources of bias that are commonly discussed in recent DID literature and can be particularly relevant in health systems research (Figure 2).DID designs assume that the effect of an intervention begins only after it has been implemented. Health system interventions, however, often involve intrinsic or extrinsic incentives that can be anticipated by the treated group prior to the intervention. In particular, healthcare accreditations (e.g., age-friendly recognition) can be sought in response to changes in practices that are already in place and precede receiving such recognitions. The treatment effect in these settings can be biased because of changes leading up to the intervention (rather than intervention onset). Conceptual arguments for absence of anticipation can describe context-specific challenges for participants or organizations to predict and influence future outcomes. Empirical falsification tests can be applied to examine the extent of anticipation such as shifting the timing of the intervention to a hypothetical period prior to the actual intervention onset.13 The anticipation from pre-periods can also be removed from the main treatment effect by including an additional interaction term between treated groups and indicators for a washout period leading up to the intervention.14Time-varying unobserved factors may influence the program entry and exit (e.g., selective participation or dropouts) or lead to differential behaviors for the treated relative to control. In healthcare settings, such selection can occur at multiple levels. Many health system interventions are voluntary in nature. At the organization level, high-performing organizations may voluntarily select into the program due to a motivation to improve performance or an expected financial return, which could lead to upward biased estimates of the effectiveness. On the other hand, organizations that underperform at baseline could also be selected to participate in the program, which may result in an observed short-term improvement that is likely driven by regression toward the mean. At the clinician level, patient selection can occur as clinicians may have private knowledge about their patient risks that influence care outcomes. For studies that use disease diagnosis as the intervention, hidden biological mechanisms or unobserved comorbidities can often be the underlying driving force behind the effects following the diagnosis. The conceptual validity of the DID design will be improved if accompanied by a precise discussion on how the treated are selected to receive an intervention while demonstrating equivalent patient compositions among treated and control groups before and after the timing of the intervention. Statistical falsification exercises on populations or services not intended for treatment may also help rule out unobserved changes occurring in a medical site around the same time of the intervention (e.g., focusing on non-elderly Veterans for Age-Friendly recognition).Intervention spillover across treated and control groups is another factor that can compromise the validity of DID designs. The control group in studies assessing quality improvement may observe and learn from the treated group and change their behavior accordingly, underestimating the total effect attributed to the intervention. A more subtle form of spillover effects can also occur when the control group implements some, if not all, elements of the intervention while not being categorized as treated (e.g., applying aspects of geriatric care transformations without seeking a recognition status). Statistical evidence of geographic distance or physical separation limiting learning between groups or conceptual arguments of unlikely spillover channels can be helpful to bolster the credibility of the parallel trend assumption.Causal inference is not binary—moving from nonexperimental designs that can only account for observed characteristics, to DID and other quasi-experimental designs that instead exploit a source of randomness and may account for unobserved differences, and eventually to RCTs that deliberately eliminate both observed and unobserved confounders (but can sometimes be infeasible to conduct). When appropriate, using quasi-experimental designs to approach questions in health systems research provides strong and actionable evidence. The level of credibility in causal inference designs, however, rests on the plausibility of the underlying assumptions, which need to be explicitly outlined, statistically assessed when possible, and contextualized given the specific intervention or setting. This is especially important as many medical journals that used to reserve the use of causal language to RCT frameworks are now moving toward facilitating the introduction of causal claims in appropriately designed observational studies.15, 16 Advancements in DID are rapidly evolving and paving the way for more credible statistical estimation of intervention effects.11 Yet, presenting conceptual arguments that motivate the use of DID designs in health systems research is perhaps even more important to enhance the quality of evaluations. Clinicians and healthcare professionals are uniquely positioned within health systems to spot contexts that can deliver promising DID designs for causal inference.All authors contributed equally to the manuscript, including conceptualization, drafting, editing, and final approval.The authors report no conflicts of interest to disclose.None.","PeriodicalId":17240,"journal":{"name":"Journal of the American Geriatrics Society","volume":"73 1","pages":"8-11"},"PeriodicalIF":4.5000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jgs.19180","citationCount":"0","resultStr":"{\"title\":\"Applying difference-in-differences design in quality improvement and health systems research\",\"authors\":\"Yucheng Hou PhD, MPP, Abdelaziz Alsharawy PhD\",\"doi\":\"10.1111/jgs.19180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Assessing the effectiveness of a health system intervention when randomized controlled trials (RCTs) are infeasible has long been a challenge for clinicians, health economists, and health service researchers alike. Difference-in-differences (DID) is a quasi-experimental study design that can be particularly appealing in addressing this challenge using observational data. Other nonexperimental study designs, such as regression adjustment or propensity score matching, attempt to examine the impact of an intervention by only accounting for the observed differences between groups. In contrast, an appropriately designed DID study aims to exploit randomness in intervention timing to identify the causal effects of the intervention. The number of published papers applying DID designs in the medical field has been increasing in recent years.1, 2 Following this trend, the Journal of the American Geriatrics Society (JAGS) published 18 studies, mostly since 2018 (original data from the authors), that apply DID designs to examine a wide range of health system interventions that pertain to geriatrics care (Figure 1).DID designs assess the effect of an intervention (e.g., health policy or program) applied to one or more groups (treated) by comparing their outcomes relative to a group that has never or not yet received the intervention (control) in terms of two differences.3, 4 The first set of differences compares outcomes before and after the timing of the intervention for the treated and control groups, respectively. This process removes the observed and unobserved group-specific factors that do not change over time. Subtracting these differences (i.e., the second difference or difference-in-differences) removes the time-varying trends that are common to both groups. Together, DID identifies the causal effect of the intervention assuming that the treated would have experienced the same trend as the control group in the absence of the intervention (parallel trends).In the recent issue of JAGS, a study by Burke and colleagues5 used a DID design to evaluate changes in patient care outcomes following the Age-Friendly health systems recognition in the Veterans Health Administration. The authors incorporated recent advances in DID with staggered treatment timing developed by Sun and Abraham,6 which is appropriate as the receipt of recognition across the medical sites happened at different times. This approach addresses potential biases in traditional DID estimation—often referred to as two-way fixed effects—when treatment effects are not constant over time and differ by late versus early treated sites.6-8 Given the absence of an RCT in this setting, one of the notable strengths of this study stems from using observational data to measure the effect of recognition for implementing evidence-based care transformations (4Ms: what Matters, Medication, Mentation, and Mobility) in geriatric care outcomes. While the findings clearly describe a positive association between Age-Friendly recognition and facility-free days, the readers are met with a typical conundrum when DID designs are adopted: Can we interpret these relationships as causal effects?Answering this question requires making an explicit argument for the plausibility of core DID assumptions—both statistically and conceptually. Typically, studies using DID devote substantial attention to argue for the validity of the parallel trend assumption by demonstrating similar trends in outcomes between treated and control prior to the intervention (hereinafter, pre-trend tests). If a healthcare outcome prior to the intervention (e.g., number of facility-free days) was already increasing (or decreasing) for the treated relative to the control, then observed differences post the intervention may merely be the continuation of the pre-trend—not the treatment effect. Although most publications applying DID in JAGS discussed or visually assessed parallel pre-trends, only a handful reported statistical tests (Figure 1). Visual inspections, however intuitive, may mask differential trends leading up to the intervention or may be too noisy to provide a compelling critique of the pre-trend. Statistical tests that are adequately powered to detect differences in pre-trends between treated and control groups would be more transparent; recent literature has been focusing on diagnostics of power of pre-trend testing.9-11 If, however, nonparallel pre-trends are evident, adjusting for or matching on time-invariant observed characteristics measured at baseline that are associated with treatment status and the outcome trends may be justified.12Nonetheless, even if the parallel trend appears to be satisfied prior to the intervention, does this criterion warrant that a DID design is readily geared to identify the causal impact of an intervention? Not necessarily. First, we want to make a distinction between parallel trend assumption and pre-trend tests.10 This distinction is important to highlight because the parallel trend assumption involves a counterfactual concept that is inherently untestable: What would have happened had the treated group not received the treatment? Pre-trend tests, if passed, do lend credibility to a DID design. Yet, a more critical question emerges when assessing the overall validity of the parallel trend assumption: What time-varying unobserved confounding factors may have resulted in or coincide with the intervention taking place? This question fundamentally pertains to the nature and timing of the intervention. Indeed, this core aspect of the DID assumption is more subtle and is prone to fail in many practical applications.1 Beyond pre-trend tests, conceptual discussions with context-specific examinations are necessary for establishing the rationale for causal inference using DID designs.10 We next focus on three potential sources of bias that are commonly discussed in recent DID literature and can be particularly relevant in health systems research (Figure 2).DID designs assume that the effect of an intervention begins only after it has been implemented. Health system interventions, however, often involve intrinsic or extrinsic incentives that can be anticipated by the treated group prior to the intervention. In particular, healthcare accreditations (e.g., age-friendly recognition) can be sought in response to changes in practices that are already in place and precede receiving such recognitions. The treatment effect in these settings can be biased because of changes leading up to the intervention (rather than intervention onset). Conceptual arguments for absence of anticipation can describe context-specific challenges for participants or organizations to predict and influence future outcomes. Empirical falsification tests can be applied to examine the extent of anticipation such as shifting the timing of the intervention to a hypothetical period prior to the actual intervention onset.13 The anticipation from pre-periods can also be removed from the main treatment effect by including an additional interaction term between treated groups and indicators for a washout period leading up to the intervention.14Time-varying unobserved factors may influence the program entry and exit (e.g., selective participation or dropouts) or lead to differential behaviors for the treated relative to control. In healthcare settings, such selection can occur at multiple levels. Many health system interventions are voluntary in nature. At the organization level, high-performing organizations may voluntarily select into the program due to a motivation to improve performance or an expected financial return, which could lead to upward biased estimates of the effectiveness. On the other hand, organizations that underperform at baseline could also be selected to participate in the program, which may result in an observed short-term improvement that is likely driven by regression toward the mean. At the clinician level, patient selection can occur as clinicians may have private knowledge about their patient risks that influence care outcomes. For studies that use disease diagnosis as the intervention, hidden biological mechanisms or unobserved comorbidities can often be the underlying driving force behind the effects following the diagnosis. The conceptual validity of the DID design will be improved if accompanied by a precise discussion on how the treated are selected to receive an intervention while demonstrating equivalent patient compositions among treated and control groups before and after the timing of the intervention. Statistical falsification exercises on populations or services not intended for treatment may also help rule out unobserved changes occurring in a medical site around the same time of the intervention (e.g., focusing on non-elderly Veterans for Age-Friendly recognition).Intervention spillover across treated and control groups is another factor that can compromise the validity of DID designs. The control group in studies assessing quality improvement may observe and learn from the treated group and change their behavior accordingly, underestimating the total effect attributed to the intervention. A more subtle form of spillover effects can also occur when the control group implements some, if not all, elements of the intervention while not being categorized as treated (e.g., applying aspects of geriatric care transformations without seeking a recognition status). Statistical evidence of geographic distance or physical separation limiting learning between groups or conceptual arguments of unlikely spillover channels can be helpful to bolster the credibility of the parallel trend assumption.Causal inference is not binary—moving from nonexperimental designs that can only account for observed characteristics, to DID and other quasi-experimental designs that instead exploit a source of randomness and may account for unobserved differences, and eventually to RCTs that deliberately eliminate both observed and unobserved confounders (but can sometimes be infeasible to conduct). When appropriate, using quasi-experimental designs to approach questions in health systems research provides strong and actionable evidence. The level of credibility in causal inference designs, however, rests on the plausibility of the underlying assumptions, which need to be explicitly outlined, statistically assessed when possible, and contextualized given the specific intervention or setting. This is especially important as many medical journals that used to reserve the use of causal language to RCT frameworks are now moving toward facilitating the introduction of causal claims in appropriately designed observational studies.15, 16 Advancements in DID are rapidly evolving and paving the way for more credible statistical estimation of intervention effects.11 Yet, presenting conceptual arguments that motivate the use of DID designs in health systems research is perhaps even more important to enhance the quality of evaluations. Clinicians and healthcare professionals are uniquely positioned within health systems to spot contexts that can deliver promising DID designs for causal inference.All authors contributed equally to the manuscript, including conceptualization, drafting, editing, and final approval.The authors report no conflicts of interest to disclose.None.\",\"PeriodicalId\":17240,\"journal\":{\"name\":\"Journal of the American Geriatrics Society\",\"volume\":\"73 1\",\"pages\":\"8-11\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jgs.19180\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Geriatrics Society\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://agsjournals.onlinelibrary.wiley.com/doi/10.1111/jgs.19180\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GERIATRICS & GERONTOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Geriatrics Society","FirstCategoryId":"3","ListUrlMain":"https://agsjournals.onlinelibrary.wiley.com/doi/10.1111/jgs.19180","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

长期以来，在无法进行随机对照试验（rct）的情况下，评估卫生系统干预措施的有效性一直是临床医生、卫生经济学家和卫生服务研究人员面临的挑战。差异中的差异（DID）是一种准实验研究设计，在利用观测数据解决这一挑战方面特别有吸引力。其他非实验研究设计，如回归调整或倾向评分匹配，试图通过仅考虑观察到的组间差异来检查干预的影响。相反，适当设计的DID研究旨在利用干预时间的随机性来确定干预的因果效应。近年来，在医学领域应用DID设计的论文数量不断增加。遵循这一趋势，《美国老年病学会杂志》（JAGS）发表了18项研究，其中大部分是自2018年以来的（原始数据来自作者），这些研究应用DID设计来检查与老年护理有关的各种卫生系统干预措施（图1）。DID设计评估干预措施的效果(例如，健康政策或计划)应用于一个或多个组（治疗组），通过将其结果与从未或尚未接受干预的组（对照组）进行比较，得出两种差异。3,4第一组差异分别比较了治疗组和对照组在干预前后的结果。这个过程消除了观察到的和未观察到的不随时间变化的群体特定因素。减去这些差异（即，第二个差异或差异中的差异）将消除两组共有的随时间变化的趋势。总之，DID确定了干预的因果效应，假设治疗组在没有干预的情况下会经历与对照组相同的趋势（平行趋势）。在最近一期的JAGS杂志上，伯克和他的同事进行了一项研究，使用DID设计来评估退伍军人健康管理局认可老年友好医疗系统后病人护理结果的变化。作者将DID的最新进展与Sun和Abraham开发的交错治疗时间相结合，6这是适当的，因为在不同的医疗地点接受认可发生在不同的时间。这种方法解决了传统DID估计中的潜在偏差——通常被称为双向固定效应——当治疗效果随着时间的推移而不恒定，并且在晚期和早期治疗部位有所不同。6-8考虑到在这种情况下缺乏随机对照试验，本研究的显著优势之一源于使用观察性数据来衡量在老年护理结果中实施循证护理转变（4Ms：什么重要，药物，心理状态和活动）的认知效果。虽然研究结果清楚地描述了老年友好识别和无设施日之间的积极联系，但当采用DID设计时，读者会遇到一个典型的难题：我们能否将这些关系解释为因果关系？要回答这个问题，需要对DID核心假设的合理性进行明确的论证——从统计上和概念上都是如此。通常，使用DID的研究通过证明干预前治疗组和对照组之间结果的相似趋势（以下简称趋势前测试），投入大量精力来论证平行趋势假设的有效性。如果干预前的医疗保健结果（例如，无设施天数）相对于对照组已经增加（或减少），那么干预后观察到的差异可能仅仅是前趋势的延续，而不是治疗效果。尽管大多数在JAGS中应用DID的出版物讨论或视觉评估了平行的预趋势，但只有少数报告了统计测试（图1）。视觉检查，无论多么直观，可能会掩盖导致干预的不同趋势，或者可能过于嘈杂，无法提供对预趋势的令人信服的批评。有足够能力检测治疗组和对照组之间趋势前差异的统计检验将更加透明；最近的文献一直关注于趋势前测试能力的诊断。9-11然而，如果非平行的前趋势是明显的，调整或匹配在基线测量的与治疗状况和结果趋势相关的时不变观察特征可能是合理的。然而，即使在干预之前平行趋势似乎得到满足，这一标准是否证明DID设计很容易用于识别干预的因果影响？不一定。首先，我们要区分平行趋势假设和趋势前检验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Applying difference-in-differences design in quality improvement and health systems research

Assessing the effectiveness of a health system intervention when randomized controlled trials (RCTs) are infeasible has long been a challenge for clinicians, health economists, and health service researchers alike. Difference-in-differences (DID) is a quasi-experimental study design that can be particularly appealing in addressing this challenge using observational data. Other nonexperimental study designs, such as regression adjustment or propensity score matching, attempt to examine the impact of an intervention by only accounting for the observed differences between groups. In contrast, an appropriately designed DID study aims to exploit randomness in intervention timing to identify the causal effects of the intervention. The number of published papers applying DID designs in the medical field has been increasing in recent years.^{1, 2} Following this trend, the Journal of the American Geriatrics Society (JAGS) published 18 studies, mostly since 2018 (original data from the authors), that apply DID designs to examine a wide range of health system interventions that pertain to geriatrics care (Figure 1).

DID designs assess the effect of an intervention (e.g., health policy or program) applied to one or more groups (treated) by comparing their outcomes relative to a group that has never or not yet received the intervention (control) in terms of two differences.^{3, 4} The first set of differences compares outcomes before and after the timing of the intervention for the treated and control groups, respectively. This process removes the observed and unobserved group-specific factors that do not change over time. Subtracting these differences (i.e., the second difference or difference-in-differences) removes the time-varying trends that are common to both groups. Together, DID identifies the causal effect of the intervention assuming that the treated would have experienced the same trend as the control group in the absence of the intervention (parallel trends).

In the recent issue of JAGS, a study by Burke and colleagues⁵ used a DID design to evaluate changes in patient care outcomes following the Age-Friendly health systems recognition in the Veterans Health Administration. The authors incorporated recent advances in DID with staggered treatment timing developed by Sun and Abraham,⁶ which is appropriate as the receipt of recognition across the medical sites happened at different times. This approach addresses potential biases in traditional DID estimation—often referred to as two-way fixed effects—when treatment effects are not constant over time and differ by late versus early treated sites.^6-8 Given the absence of an RCT in this setting, one of the notable strengths of this study stems from using observational data to measure the effect of recognition for implementing evidence-based care transformations (4Ms: what Matters, Medication, Mentation, and Mobility) in geriatric care outcomes. While the findings clearly describe a positive association between Age-Friendly recognition and facility-free days, the readers are met with a typical conundrum when DID designs are adopted: Can we interpret these relationships as causal effects?

Answering this question requires making an explicit argument for the plausibility of core DID assumptions—both statistically and conceptually. Typically, studies using DID devote substantial attention to argue for the validity of the parallel trend assumption by demonstrating similar trends in outcomes between treated and control prior to the intervention (hereinafter, pre-trend tests). If a healthcare outcome prior to the intervention (e.g., number of facility-free days) was already increasing (or decreasing) for the treated relative to the control, then observed differences post the intervention may merely be the continuation of the pre-trend—not the treatment effect. Although most publications applying DID in JAGS discussed or visually assessed parallel pre-trends, only a handful reported statistical tests (Figure 1). Visual inspections, however intuitive, may mask differential trends leading up to the intervention or may be too noisy to provide a compelling critique of the pre-trend. Statistical tests that are adequately powered to detect differences in pre-trends between treated and control groups would be more transparent; recent literature has been focusing on diagnostics of power of pre-trend testing.^9-11 If, however, nonparallel pre-trends are evident, adjusting for or matching on time-invariant observed characteristics measured at baseline that are associated with treatment status and the outcome trends may be justified.¹²

Nonetheless, even if the parallel trend appears to be satisfied prior to the intervention, does this criterion warrant that a DID design is readily geared to identify the causal impact of an intervention? Not necessarily. First, we want to make a distinction between parallel trend assumption and pre-trend tests.¹⁰ This distinction is important to highlight because the parallel trend assumption involves a counterfactual concept that is inherently untestable: What would have happened had the treated group not received the treatment? Pre-trend tests, if passed, do lend credibility to a DID design. Yet, a more critical question emerges when assessing the overall validity of the parallel trend assumption: What time-varying unobserved confounding factors may have resulted in or coincide with the intervention taking place? This question fundamentally pertains to the nature and timing of the intervention. Indeed, this core aspect of the DID assumption is more subtle and is prone to fail in many practical applications.¹ Beyond pre-trend tests, conceptual discussions with context-specific examinations are necessary for establishing the rationale for causal inference using DID designs.¹⁰ We next focus on three potential sources of bias that are commonly discussed in recent DID literature and can be particularly relevant in health systems research (Figure 2).

DID designs assume that the effect of an intervention begins only after it has been implemented. Health system interventions, however, often involve intrinsic or extrinsic incentives that can be anticipated by the treated group prior to the intervention. In particular, healthcare accreditations (e.g., age-friendly recognition) can be sought in response to changes in practices that are already in place and precede receiving such recognitions. The treatment effect in these settings can be biased because of changes leading up to the intervention (rather than intervention onset). Conceptual arguments for absence of anticipation can describe context-specific challenges for participants or organizations to predict and influence future outcomes. Empirical falsification tests can be applied to examine the extent of anticipation such as shifting the timing of the intervention to a hypothetical period prior to the actual intervention onset.¹³ The anticipation from pre-periods can also be removed from the main treatment effect by including an additional interaction term between treated groups and indicators for a washout period leading up to the intervention.¹⁴

Time-varying unobserved factors may influence the program entry and exit (e.g., selective participation or dropouts) or lead to differential behaviors for the treated relative to control. In healthcare settings, such selection can occur at multiple levels. Many health system interventions are voluntary in nature. At the organization level, high-performing organizations may voluntarily select into the program due to a motivation to improve performance or an expected financial return, which could lead to upward biased estimates of the effectiveness. On the other hand, organizations that underperform at baseline could also be selected to participate in the program, which may result in an observed short-term improvement that is likely driven by regression toward the mean. At the clinician level, patient selection can occur as clinicians may have private knowledge about their patient risks that influence care outcomes. For studies that use disease diagnosis as the intervention, hidden biological mechanisms or unobserved comorbidities can often be the underlying driving force behind the effects following the diagnosis. The conceptual validity of the DID design will be improved if accompanied by a precise discussion on how the treated are selected to receive an intervention while demonstrating equivalent patient compositions among treated and control groups before and after the timing of the intervention. Statistical falsification exercises on populations or services not intended for treatment may also help rule out unobserved changes occurring in a medical site around the same time of the intervention (e.g., focusing on non-elderly Veterans for Age-Friendly recognition).

Intervention spillover across treated and control groups is another factor that can compromise the validity of DID designs. The control group in studies assessing quality improvement may observe and learn from the treated group and change their behavior accordingly, underestimating the total effect attributed to the intervention. A more subtle form of spillover effects can also occur when the control group implements some, if not all, elements of the intervention while not being categorized as treated (e.g., applying aspects of geriatric care transformations without seeking a recognition status). Statistical evidence of geographic distance or physical separation limiting learning between groups or conceptual arguments of unlikely spillover channels can be helpful to bolster the credibility of the parallel trend assumption.

Causal inference is not binary—moving from nonexperimental designs that can only account for observed characteristics, to DID and other quasi-experimental designs that instead exploit a source of randomness and may account for unobserved differences, and eventually to RCTs that deliberately eliminate both observed and unobserved confounders (but can sometimes be infeasible to conduct). When appropriate, using quasi-experimental designs to approach questions in health systems research provides strong and actionable evidence. The level of credibility in causal inference designs, however, rests on the plausibility of the underlying assumptions, which need to be explicitly outlined, statistically assessed when possible, and contextualized given the specific intervention or setting. This is especially important as many medical journals that used to reserve the use of causal language to RCT frameworks are now moving toward facilitating the introduction of causal claims in appropriately designed observational studies.^{15, 16} Advancements in DID are rapidly evolving and paving the way for more credible statistical estimation of intervention effects.¹¹ Yet, presenting conceptual arguments that motivate the use of DID designs in health systems research is perhaps even more important to enhance the quality of evaluations. Clinicians and healthcare professionals are uniquely positioned within health systems to spot contexts that can deliver promising DID designs for causal inference.

All authors contributed equally to the manuscript, including conceptualization, drafting, editing, and final approval.

The authors report no conflicts of interest to disclose.

None.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the American Geriatrics Society 医学-老年医学

CiteScore

10.00

自引率

6.30%

发文量

504

审稿时长

3-6 weeks

期刊介绍： Journal of the American Geriatrics Society (JAGS) is the go-to journal for clinical aging research. We provide a diverse, interprofessional community of healthcare professionals with the latest insights on geriatrics education, clinical practice, and public policy—all supporting the high-quality, person-centered care essential to our well-being as we age. Since the publication of our first edition in 1953, JAGS has remained one of the oldest and most impactful journals dedicated exclusively to gerontology and geriatrics.