{"title":"Austin Bradford Hill’s ‘Environment and disease: Association or causation’","authors":"Wayne Hall","doi":"10.1111/add.16329","DOIUrl":null,"url":null,"abstract":"<p>Austin Bradford Hill’s 1965 paper [<span>1</span>] on disease and the environment is one of the most widely cited papers in public health and related fields [<span>2</span>]. Hill’s nine ‘criteria’ for causal inferences were used in 87% of recent reviews of evidence on causality in population health [<span>3</span>]. In the addictions field, Hill’s paper has been used to evaluate causal explanations of: associations between psychosis and the use of cannabis and nicotine [<span>4</span>]; the effects of naloxone distribution on opioid overdose deaths [<span>5</span>]; the effects of a minimum unit price for alcohol policy on alcohol-related harm [<span>6</span>]; and the role played by vitamin E acetate in the 2019 outbreak of lung injury from vaping cannabis oil in the United States [<span>7</span>]. Hill’s ‘criteria’ have also been used in courts in the United States to assess causal claims about injuries caused by chemical exposures [<span>8, 9</span>].</p><p>Why has a paper that was based on a lecture given to occupational physicians over 50 years ago come to be regarded as an authoritative method for making causal inferences from observational data? Its continued role as a foundational citation classic is surprising because in the more than half-century since its publication there have been major advances in thinking about causal inference [<span>10-12</span>] and in the development of advanced statistical methods and research designs to improve the quality of causal inferences from observational studies [<span>10, 11, 13-15</span>].</p><p>Hill assumed that we have observations that: ‘… reveal an association between two variables, perfectly clear-cut and beyond what we would attribute to the play of chance’ and asked: ‘What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?’ ([<span>1</span>], p. 295).</p><p>Hill outlined nine factors that he thought needed to be considered in making this decision.</p><p>Hill stressed (p. 299) that it was not possible to ‘lay down some hard-and-fast rules of evidence that must be obeyed before we accept cause and effect. None of my nine viewpoints can bring indisputable evidence for or against the cause-effect hypothesis and none can be required as a sine qua non. What they can do, with greater or less strength, is to help us make up our minds on the fundamental question—is there any other way of explaining the set of facts before us, is there any other answer equally, or more likely than cause and effect?’</p><p>Hill was also clear that tests of statistical significance could not directly answer causal questions. At most they ‘can, and should, remind us of the effects that the play of chance can create, and they will instruct us in the likely magnitude of those effects’ (p. 299).</p><p>Finally, Hill asked: ‘what action may flow from a conclusion that an association is causal? In ‘occupational medicine our object is usually to take action. If this be operative cause and that be a deleterious effect, then we shall wish to intervene to abolish or reduce death or disease’ (p. 300).</p><p>He suggested that we should use ‘differential standards before we convict’ depending upon the social consequences of the action we propose. We may, for example, be prepared to restrict the use of a drug to treat morning sickness in pregnant women on ‘slight evidence’. We may be prepared to ‘remove a probable occupational hazard on the basis of fair evidence’. We should, however, ‘need very strong evidence before we made people burn a fuel in their homes that they do not like or stop smoking the cigarettes and eating the fats and sugar that they do like… [but] this does not imply crossing every ‘t’, and swords with every critic, before we act’ (p. 300).</p><p>Hill’s paper has several strengths. First, he made clear that different types of evidence from multiple types of study needed to be considered when deciding whether an association was likely to be causal or not [<span>8</span>]. He emphasized the need to judge the degree of consilience between these diverse types of evidence in judging whether causation provided the most plausible explanation of an association [<span>8</span>]. Secondly, he emphasized the provisional nature of causal inference while insisting that we often need to take action on the basis of incomplete evidence [<span>2</span>]. Thirdly, he argued that the strength of the evidence required to justify action varied with the costs and inconvenience of any changes required to produce its benefits [<span>2</span>].</p><p>The major limitation of Hill’s paper was its lack of an explicit justification for the guidelines. His analysis arose out of the debate in the United Kingdom and United States in 1950s and 1960s regarding whether cigarette smoking was a cause of lung cancer and other diseases, a debate in which Hill and Doll’s epidemiological research played an important role [<span>10, 16</span>]. Hill’s approach expanded upon the five criteria (strength, consistency, specificity, temporality and coherence) that were proposed by Stallones [<span>17</span>] and included (without explicit justification) in the 1964 US Surgeon General’s (USSG) report on cigarette smoking and health [<span>18</span>].</p><p>Hill [<span>1</span>] used the 1964 USSG report to illustrate the role of several of his considerations, but he did not discuss its causal criteria [<span>16</span>]. His response to an inquiry from the librarian at the London School of Hygiene and Tropical Medicine about the source of his guidance was unilluminating (see [<span>16</span>]). It is also unclear whether an approach to causal inference derived from the debate about cigarette smoking and disease will be useful in assessing causal claims more generally [<span>10</span>].</p><p>Another major weakness of Hill’s analysis was that that each of his considerations were highly qualified. Except for temporality, he acknowledged that each one may, or may not, indicate a causal relationship [<span>19</span>]. Critics have since raised doubts about the value of specificity, plausibility, coherence and analogy [<span>10, 15, 19</span>]. Most critically, Hill did not provide any rationale for deciding how his nine considerations should be used in arriving at conclusion on the plausibility of a causal inference [<span>8</span>]. Indeed, he seemed to disavow the possibility of any ‘hard-and-fast rules’ [<span>8</span>].</p><p>Despite these weaknesses, and Hill's cautionary concluding statement, his nine considerations have been widely used as ‘criteria for causal inference’ [<span>2, 8</span>]. Their application to controversial cases requires judgement and it is unclear whether researchers with different views on a specific debate about causation would agree in how they judge the evidence against each of Hill’s considerations [<span>10, 15</span>]. Most often, Hill’s ‘criteria’ seem to have been used like a diagnostic checklist on the assumption that the more that are satisfied, the more plausible a causal inference is [<span>8</span>].</p><p>Scholars have attempted to improve upon Hill’s approach to causal inference (e.g. [<span>20-25</span>]), but none of their suggested alternative approaches has been widely adopted. I briefly summarize three.</p><p>Shimonovich <i>et al</i>. [<span>15</span>] compared Hill’s guidelines with more formalized approaches to causal inference, such as the use of directed acyclic graphs, sufficient-component causes (causal pies) and grading of recommendations, assessment and evaluation (GRADE approach). They found that the four approaches were consistent in the value of: <i>strength of association</i> (including analysis of plausible confounding); t<i>emporality</i>; <i>plausibility</i>; and evidence from <i>experiments</i>. They argued that <i>consistency</i> was better assessed by heterogeneity of effect sizes and suggested that specificity was rarely met and should be replaced by the evidence that other plausible causes did not explain the association. They did not regard a dose–response relationship as strong evidence for causation because it could arise from confounding. They also concluded that Hill’s concepts of coherence and analogy were of limited utility in causal inference.</p><p>Bird argued that Hill’s guidelines can rule out R and N but are unable to exclude C, because observational studies cannot exclude all possible common causes. Hill’s guidelines can help to eliminate a specific confounder, but they cannot exclude confounding by unmeasured variables in the way that a well-conducted experiment, such as a controlled clinical trial, can: e.g. by using randomization of cases to treatment or comparison condition to eliminate all possible Cs.</p><p>It is perhaps time for the addictions field to abandon the uncritical use of ‘Hill’s criteria’ as a talisman for making causal judgements, not least because Hill disavowed this way of using what he described as nine ‘considerations’. We can still acknowledge that Hill’s paper made an historically important contribution to emerging thinking about how to draw causal inferences from observational evidence while appreciating that it was necessarily ‘an early rough cut’ (<span>2</span>], p. 2). If a more formal approach to causal inference is needed, then a simpler approach would be to use Bird’s reformulation of Hill’s approach. This would evaluate the available evidence to decide between the three logical explanations of an association between A and B; namely, that it is due to chance (using evidence from statistical tests and systematic reviews) that we can be confident that A precedes B (if A is a cause of B); and that there is evidence that excludes credible common causes of both A and B, such as studies that use statistical methods to control for plausible confounders (e.g. regression-based approaches; propensity scores; instrumental variables; and formal methods based in directed acyclic graphs) [<span>10, 11</span>].</p><p><b>Wayne D Hall:</b> Conceptualization (lead); data curation (lead); formal analysis (lead); writing—original draft (lead); writing—review and editing (lead).</p><p>None to declare.</p>","PeriodicalId":109,"journal":{"name":"Addiction","volume":"119 2","pages":"386-390"},"PeriodicalIF":5.2000,"publicationDate":"2023-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/add.16329","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Addiction","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/add.16329","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0
Abstract
Austin Bradford Hill’s 1965 paper [1] on disease and the environment is one of the most widely cited papers in public health and related fields [2]. Hill’s nine ‘criteria’ for causal inferences were used in 87% of recent reviews of evidence on causality in population health [3]. In the addictions field, Hill’s paper has been used to evaluate causal explanations of: associations between psychosis and the use of cannabis and nicotine [4]; the effects of naloxone distribution on opioid overdose deaths [5]; the effects of a minimum unit price for alcohol policy on alcohol-related harm [6]; and the role played by vitamin E acetate in the 2019 outbreak of lung injury from vaping cannabis oil in the United States [7]. Hill’s ‘criteria’ have also been used in courts in the United States to assess causal claims about injuries caused by chemical exposures [8, 9].
Why has a paper that was based on a lecture given to occupational physicians over 50 years ago come to be regarded as an authoritative method for making causal inferences from observational data? Its continued role as a foundational citation classic is surprising because in the more than half-century since its publication there have been major advances in thinking about causal inference [10-12] and in the development of advanced statistical methods and research designs to improve the quality of causal inferences from observational studies [10, 11, 13-15].
Hill assumed that we have observations that: ‘… reveal an association between two variables, perfectly clear-cut and beyond what we would attribute to the play of chance’ and asked: ‘What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?’ ([1], p. 295).
Hill outlined nine factors that he thought needed to be considered in making this decision.
Hill stressed (p. 299) that it was not possible to ‘lay down some hard-and-fast rules of evidence that must be obeyed before we accept cause and effect. None of my nine viewpoints can bring indisputable evidence for or against the cause-effect hypothesis and none can be required as a sine qua non. What they can do, with greater or less strength, is to help us make up our minds on the fundamental question—is there any other way of explaining the set of facts before us, is there any other answer equally, or more likely than cause and effect?’
Hill was also clear that tests of statistical significance could not directly answer causal questions. At most they ‘can, and should, remind us of the effects that the play of chance can create, and they will instruct us in the likely magnitude of those effects’ (p. 299).
Finally, Hill asked: ‘what action may flow from a conclusion that an association is causal? In ‘occupational medicine our object is usually to take action. If this be operative cause and that be a deleterious effect, then we shall wish to intervene to abolish or reduce death or disease’ (p. 300).
He suggested that we should use ‘differential standards before we convict’ depending upon the social consequences of the action we propose. We may, for example, be prepared to restrict the use of a drug to treat morning sickness in pregnant women on ‘slight evidence’. We may be prepared to ‘remove a probable occupational hazard on the basis of fair evidence’. We should, however, ‘need very strong evidence before we made people burn a fuel in their homes that they do not like or stop smoking the cigarettes and eating the fats and sugar that they do like… [but] this does not imply crossing every ‘t’, and swords with every critic, before we act’ (p. 300).
Hill’s paper has several strengths. First, he made clear that different types of evidence from multiple types of study needed to be considered when deciding whether an association was likely to be causal or not [8]. He emphasized the need to judge the degree of consilience between these diverse types of evidence in judging whether causation provided the most plausible explanation of an association [8]. Secondly, he emphasized the provisional nature of causal inference while insisting that we often need to take action on the basis of incomplete evidence [2]. Thirdly, he argued that the strength of the evidence required to justify action varied with the costs and inconvenience of any changes required to produce its benefits [2].
The major limitation of Hill’s paper was its lack of an explicit justification for the guidelines. His analysis arose out of the debate in the United Kingdom and United States in 1950s and 1960s regarding whether cigarette smoking was a cause of lung cancer and other diseases, a debate in which Hill and Doll’s epidemiological research played an important role [10, 16]. Hill’s approach expanded upon the five criteria (strength, consistency, specificity, temporality and coherence) that were proposed by Stallones [17] and included (without explicit justification) in the 1964 US Surgeon General’s (USSG) report on cigarette smoking and health [18].
Hill [1] used the 1964 USSG report to illustrate the role of several of his considerations, but he did not discuss its causal criteria [16]. His response to an inquiry from the librarian at the London School of Hygiene and Tropical Medicine about the source of his guidance was unilluminating (see [16]). It is also unclear whether an approach to causal inference derived from the debate about cigarette smoking and disease will be useful in assessing causal claims more generally [10].
Another major weakness of Hill’s analysis was that that each of his considerations were highly qualified. Except for temporality, he acknowledged that each one may, or may not, indicate a causal relationship [19]. Critics have since raised doubts about the value of specificity, plausibility, coherence and analogy [10, 15, 19]. Most critically, Hill did not provide any rationale for deciding how his nine considerations should be used in arriving at conclusion on the plausibility of a causal inference [8]. Indeed, he seemed to disavow the possibility of any ‘hard-and-fast rules’ [8].
Despite these weaknesses, and Hill's cautionary concluding statement, his nine considerations have been widely used as ‘criteria for causal inference’ [2, 8]. Their application to controversial cases requires judgement and it is unclear whether researchers with different views on a specific debate about causation would agree in how they judge the evidence against each of Hill’s considerations [10, 15]. Most often, Hill’s ‘criteria’ seem to have been used like a diagnostic checklist on the assumption that the more that are satisfied, the more plausible a causal inference is [8].
Scholars have attempted to improve upon Hill’s approach to causal inference (e.g. [20-25]), but none of their suggested alternative approaches has been widely adopted. I briefly summarize three.
Shimonovich et al. [15] compared Hill’s guidelines with more formalized approaches to causal inference, such as the use of directed acyclic graphs, sufficient-component causes (causal pies) and grading of recommendations, assessment and evaluation (GRADE approach). They found that the four approaches were consistent in the value of: strength of association (including analysis of plausible confounding); temporality; plausibility; and evidence from experiments. They argued that consistency was better assessed by heterogeneity of effect sizes and suggested that specificity was rarely met and should be replaced by the evidence that other plausible causes did not explain the association. They did not regard a dose–response relationship as strong evidence for causation because it could arise from confounding. They also concluded that Hill’s concepts of coherence and analogy were of limited utility in causal inference.
Bird argued that Hill’s guidelines can rule out R and N but are unable to exclude C, because observational studies cannot exclude all possible common causes. Hill’s guidelines can help to eliminate a specific confounder, but they cannot exclude confounding by unmeasured variables in the way that a well-conducted experiment, such as a controlled clinical trial, can: e.g. by using randomization of cases to treatment or comparison condition to eliminate all possible Cs.
It is perhaps time for the addictions field to abandon the uncritical use of ‘Hill’s criteria’ as a talisman for making causal judgements, not least because Hill disavowed this way of using what he described as nine ‘considerations’. We can still acknowledge that Hill’s paper made an historically important contribution to emerging thinking about how to draw causal inferences from observational evidence while appreciating that it was necessarily ‘an early rough cut’ (2], p. 2). If a more formal approach to causal inference is needed, then a simpler approach would be to use Bird’s reformulation of Hill’s approach. This would evaluate the available evidence to decide between the three logical explanations of an association between A and B; namely, that it is due to chance (using evidence from statistical tests and systematic reviews) that we can be confident that A precedes B (if A is a cause of B); and that there is evidence that excludes credible common causes of both A and B, such as studies that use statistical methods to control for plausible confounders (e.g. regression-based approaches; propensity scores; instrumental variables; and formal methods based in directed acyclic graphs) [10, 11].
Wayne D Hall: Conceptualization (lead); data curation (lead); formal analysis (lead); writing—original draft (lead); writing—review and editing (lead).
希尔的方法扩展了斯塔隆斯[17]提出的五项标准(强度、一致性、特异性、时间性和连贯性),并将其纳入1964年美国卫生总监(USSG)关于吸烟与健康的报告[18](未明确说明理由)。希尔[1]利用1964年USSG报告说明了他的几项考虑因素的作用,但他并未讨论其因果标准[16]。他对伦敦卫生和热带医学学院图书管理员关于其指导来源的询问所做的答复并无启发性(见[16])。希尔分析的另一个主要弱点是,他的每项考虑都有很高的限定性。除了时间性之外,他承认每个因素都可能表明或不表明因果关系[19]。此后,批评者对具体性、可信性、连贯性和类比性的价值提出了质疑 [10, 15, 19]。最关键的是,希尔没有提供任何理由来说明在得出因果推论可信度的结论时应如何使用他的九个考虑因素[8]。事实上,他似乎否认了任何 "硬性规定 "的可能性[8]。尽管存在这些缺陷,希尔的结论声明也很谨慎,但他的九条考虑因素仍被广泛用作 "因果推论的标准"[2, 8]。将这些标准应用于有争议的案例需要判断,目前还不清楚对因果关系的具体争论持不同观点的研究人员是否会同意他们如何根据希尔的每条考虑因素来判断证据[10, 15]。最常见的情况是,希尔的 "标准 "似乎被当作诊断清单来使用,其假设是满足的标准越多,因果推论就越可信[8]。学者们试图改进希尔的因果推论方法(如[20-25]),但他们提出的替代方法都没有被广泛采用。Shimonovich 等人[15]将希尔的指南与更正规化的因果推断方法进行了比较,如使用有向无环图、充分成分原因(因果派)和建议分级、评估和评价(GRADE 方法)。他们发现,这四种方法在以下方面的价值是一致的:关联强度(包括对合理混杂因素的分析);时间性;合理性;以及来自实验的证据。他们认为,一致性最好通过效应大小的异质性来评估,并建议特异性很少得到满足,应由其他似是而非的原因无法解释关联的证据来取代。他们并不认为剂量-反应关系是因果关系的有力证据,因为它可能来自混杂因素。伯德认为,希尔的准则可以排除 R 和 N,但无法排除 C,因为观察研究无法排除所有可能的共同原因。希尔准则可以帮助排除特定的混杂因素,但却无法像临床对照试验等精心实施的实验那样排除未测量变量的混杂因素,例如,通过将病例随机分配到治疗或对比条件中来排除所有可能的C。也许现在是成瘾领域放弃不加批判地使用'希尔标准'作为因果判断的护身符的时候了,尤其是因为希尔不承认这种使用他所描述的九种'考虑因素'的方式。我们仍然可以承认,希尔的论文对如何从观察证据中得出因果推论的新思路做出了历史性的重要贡献,同时我们也应该认识到,希尔的论文必然是 "早期的粗制滥造"(2],第 2 页)。如果需要一种更正式的因果推论方法,那么一种更简单的方法就是使用伯德对希尔方法的重新表述。这将对现有证据进行评估,以便在 A 和 B 之间关联的三种逻辑解释中做出选择,即:这是由于偶然性(使用统计检验和系统性综述中的证据),我们可以确信 A 先于 B(如果 A 是 B 的原因);有证据排除了 A 和 B 的可信共同原因,如使用统计方法控制可信混杂因素的研究(如基于回归的方法;比例法)。如基于回归的方法;倾向分数;工具变量;以及基于有向无环图的正式方法)[10, 11]:构思(牵头);数据整理(牵头);形式分析(牵头);写作-原稿(牵头);写作-审阅和编辑(牵头)。
期刊介绍:
Addiction publishes peer-reviewed research reports on pharmacological and behavioural addictions, bringing together research conducted within many different disciplines.
Its goal is to serve international and interdisciplinary scientific and clinical communication, to strengthen links between science and policy, and to stimulate and enhance the quality of debate. We seek submissions that are not only technically competent but are also original and contain information or ideas of fresh interest to our international readership. We seek to serve low- and middle-income (LAMI) countries as well as more economically developed countries.
Addiction’s scope spans human experimental, epidemiological, social science, historical, clinical and policy research relating to addiction, primarily but not exclusively in the areas of psychoactive substance use and/or gambling. In addition to original research, the journal features editorials, commentaries, reviews, letters, and book reviews.